Microhomology mediated repair of microduplication gene mutations

ABSTRACT

The present invention is directed to the filed of gene therapy. In particular, compositions and methods are disclosed that repair gene microduplication mutations by reversion to a wild type sequence. For example, the creation of a double stranded break by a programmable nuclease protein within a microduplication induces the microhomology mediated end joining DNA repair pathway that in the process of DNA repair removes the microduplication mutation and restores the wild type sequence.

FIELD OF THE INVENTION

The present invention is directed to the field of gene therapy. Inparticular, compositions and methods are disclosed that repair genemicroduplication mutations by reversion to a wild type sequence. Forexample, the creation of a double stranded break within amicroduplication by a programmable nuclease protein induces themicrohomology mediated end joining DNA repair pathway that in theprocess of DNA repair removes the microduplication mutation and restoresthe wild type sequence.

BACKGROUND

Genome editing by programmable nuclease systems has revolutionizedbiological research and is rapidly moving towards many clinicalapplications. In most instances, the successful repair of an aberrantgene to correct a disease entails precise correction of the geneticsequence typically via the Homology Directed Repair (HDR) pathway. Thispathway requires not only the use of a programmable nuclease to generatea double-strand break (DSB) at the locus to initiate DNA repair, butalso the delivery of exogenous donor DNA to precisely re-write thegenomic sequence To date, HDR is inefficient in most cell types,particularly in post-mitotic differentiated cell types such as neuronsand muscle [Suzuki, K. et al. In vivo genome editing via CRISPR/Cas9mediated homology-independent targeted integration. Nature 540, 144-149(2016).], which are the affected tissues in many devastating geneticdisorders. This barrier significantly limits the clinical efficacy ofthe current generation of nuclease-based gene repair tools.

What is needed in the art are compositions and methods that can safelyand efficiently target disease-causing microduplication mutations withina genome and cure the disease by reverting the microduplication mutationto a wild type sequence.

SUMMARY

The present invention is directed to the filed of gene therapy. Inparticular, compositions and methods are disclosed that repair genemicroduplication mutations by reversion to a wild type sequence. Forexample, the creation of a double stranded break within amicroduplication by a programmable nuclease protein induces themicrohomology mediated end joining DNA repair pathway that in theprocess of DNA repair removes the microduplication mutation and restoresthe wild type sequence.

In one embodiment, the present invention contemplates a programmablenuclease having sequence-specific DNA-binding affinity for a target geneor genomic locus, wherein said target gene or genomic locus comprises amicroduplication mutation. In one embodiment, said nuclease furthercomprises a protospacer adjacent motif binding domain having saidsequence-specific DNA-binding affinity for said target gene or genomiclocus protospacer adjacent motif sequence. In one embodiment, thenuclease includes, but is not limited to, a Class II CRISPR singleeffector nuclease, a Cas9 nuclease, a Cas12 nuclease, a zinc fingernuclease and/or a transcription activator-like effector nuclease. In oneembodiment, a duplicate sequence of the microduplication mutation has alength of between 1-40 nucleotides. In one embodiment, a duplicatesequence of the microduplication mutation has a length of greater than40 nucleotides.

In one embodiment, the present invention contemplates a method,comprising; i) a subject comprising a target gene or genomic locushaving a microduplication mutation; and ii) a pharmaceutical formulationcomprising a programmable nuclease, the nuclease havingsequence-specific DNA-binding affinity for a region that contains saidmicroduplication mutation of the target gene or genomic locus; and b)administering said pharmaceutical formulation to the patient underconditions such that the microduplication mutation is replaced with awild type sequence of the target gene or genomic locus. In oneembodiment, said wild type sequence replacement comprises a correctionthrough DNA repair. In one embodiment, the DNA repair correction isperformed without assistance of an exogenously supplied donor DNA. Inone embodiment, said nuclease further comprises a protospacer adjacentmotif binding domain having said DNA-binding specificity for said targetgene or genomic locus protospacer adjacent motif sequence. In oneembodiment, the target gene includes, but is not limited to, TCAP, HPS1,HEXA, DOK7 and/or RAX2. In one embodiment, the subject further exhibitsat least one symptom of a disease caused by the target genemicroduplication mutation. In one embodiment, the disease includes, butis not limited to limb-girdle muscular dystrophy 2G, Hermanksy-Pudlaksyndrome, Tay-Sachs Disease, familial limb-girdle myasthenia and/orcone-rod dystrophy 11. In one embodiment, administering further reducesthe at least one symptom of the disease. In one embodiment, the nucleaseincludes, but is not limited to, a Class II CRISPR single effectornuclease, a Cas9 nuclease, a Cas12 nuclease, a zinc finger nucleaseand/or a transcription activator-like effector nuclease. In oneembodiment, the pharamaceutical formulation comprises anadeno-associated virus encoding said programmable nuclease.

Definitions

To facilitate the understanding of this invention, a number of terms aredefined below. Terms defined herein have meanings as commonly understoodby a person of ordinary skill in the areas relevant to the presentinvention. Terms such as “a”, “an” and “the” are not intended to referto only a singular entity but also plural entities and also includes thegeneral class of which a specific example may be used for illustration.The terminology herein may be used to describe specific embodiments ofthe invention, but their usage does not delimit the invention, except asoutlined in the claims.

The term “about” as used herein, in the context of any of any assaymeasurements refers to +/−5% of a given measurement.

As used herein, the term “CRISPRs” or “Clustered Regularly InterspacedShort Palindromic Repeats” refers to an acronym for DNA loci thatcontain multiple, short, direct repetitions of base sequences. Eachrepetition contains a series of bases followed by the same series inreverse and then by 30 or so base pairs known as “spacer DNA”. Thespacers are short segments of DNA from a virus and may serve as a‘memory’ of past exposures to facilitate an adaptive defense againstfuture invasions (PMID 25430774).

As used herein, the term “Cas” or “CRISPR-associated (cas)” refers togenes often associated with CRISPR repeat-spacer arrays (PMID 25430774).

As used herein, the term “Cas9” refers to a nuclease from Type II CRISPRsystems, an enzyme specialized for generating double-strand breaks inDNA, with two active cutting sites (the HNH and RuvC domains), one foreach strand of the double helix. Jinek combined tracrRNA and spacer RNAinto a “single-guide RNA” (sgRNA) molecule that, mixed with Cas9, couldfind and cleave DNA targets through Watson-Crick pairing between theguide sequence within the sgRNA and the target DNA sequence (PMID22745249).

As used herein, the term “catalytically active Cas9” refers to anunmodified Cas9 nuclease comprising full nuclease activity.

The term “nickase” as used herein, refers to a nuclease that cleavesonly a single DNA strand, either due to its natural function or becauseit has been engineered to cleave only a single DNA strand. Cas9 nickasevariants that have either the RuvC or the HNH domain mutated providecontrol over which DNA strand is cleaved and which remains intact(Jinek, et al. 2012 (PMID 22745249) and Cong, et al. 2013 (PMID23287718)).

As used herein, the term “Cas12” (or Cpf1) refers to a nuclease fromType V CRISPR systems, an enzyme specialized for generatingdouble-strand breaks in DNA, with one active cutting sites (the RuvCdomain), that cuts both DNA strands. Zetsche demonstrated that whenprogrammed with its crRNA Cas12 (Cpf1), could find and cleave DNAtargets through Watson-Crick pairing between the guide sequence withinthe crRNA and the target DNA sequence (PMID 26422227).

The term, “trans-activating crRNA”, “tracrRNA” as used herein, refers toa small trans-encoded RNA. For example, CRISPR/Cas (clustered, regularlyinterspaced short palindromic repeats/CRISPR-associated proteins)constitutes an RNA-mediated defense system, which protects againstviruses and plasmids. This defensive pathway has three steps. First acopy of the invading nucleic acid is integrated into the CRISPR locus.Next, CRISPR RNAs (crRNAs) are transcribed from this CRISPR locus. ThecrRNAs are then incorporated into effector complexes, where the crRNAguides the complex to the invading nucleic acid and the Cas proteinsdegrade this nucleic acid. There are several pathways of CRISPRactivation, one of which requires a tracrRNA, which plays a role in thematuration of crRNA. TracrRNA is complementary to base pairs with apre-crRNA forming an RNA duplex. This is cleaved by RNase III, anRNA-specific ribonuclease, to form a crRNA/tracrRNA hybrid. This hybridacts as a guide for the endonuclease Cas9, which cleaves the invadingnucleic acid.

The term “nuclease” as used herein, refers to any protein comprising apre-determined sequence of amino acids that bind to a specificnucleotide sequence and create a double stranded break. Such nucleasescan include, but are not limited to, a Class II CRISPR single effectornuclease, a Cas9 nuclease, a Cas12 nuclease (also known as Cpf1), a zincfinger nuclease (ZFN) protein and/or a transcription activator-likeeffector nuclease (TALEN). For example, a Class II CRISPR singleeffector nuclease and/or a Cas9 nuclease may be assembled into a CRISPRcomplex.

The term “protospacer adjacent motif” (or PAM) as used herein, refers toa DNA sequence that may be required for a Cas9/sgRNA to form an R-loopto interrogate a specific DNA sequence through Watson-Crick pairing ofits guide RNA with the genome.

The term “protospacer adjacent motif recognition domain” as used herein,refers to a nuclease C-terminus amino acid sequence having specificDNA-binding specificity to a target gene PAM sequence.

The term “target gene” as used herein, refers to a specific genomicregion, usually comprising at least one allele, whose dysfunction isassociated with a disease. For example, a target gene may have amicroduplication mutation that is a causative factor for a disease. Amicroduplication can be composed of a tandem repeat. Tandem repeats inDNA are a pattern of one or more nucleotides that are repeated and therepetitions are directly adjacent to each other.

As used herein, the term “sgRNA” refers to single guide RNA used inconjunction with CRISPR associated systems (Cas). sgRNAs are a fusion ofcrRNA and tracrRNA and contain nucleotides of sequence complementary tothe desired target site (Jinek, et al. 2012 (PMID 22745249)).Watson-Crick pairing of the sgRNA with the target site permits R-loopformation, which in conjunction with a functional PAM permits DNAcleavage or in the case of nuclease-deficient Cas9 allows binds to theDNA at that locus.

The term “patient” or “subject”, as used herein, is a human or animaland need not be hospitalized. For example, out-patients, persons innursing homes are “patients.” A patient may comprise any age of a humanor non-human animal and therefore includes both adult and juveniles(i.e., children). It is not intended that the term “patient” connote aneed for medical treatment, therefore, a patient may voluntarily orinvoluntarily be part of experimentation whether clinical or in supportof basic science studies.

The term “affinity” as used herein, refers to any attractive forcebetween substances or particles that causes them to enter into andremain in chemical combination. For example, an inhibitor compound thathas a high affinity for a receptor will provide greater efficacy inpreventing the receptor from interacting with its natural ligands, thanan inhibitor with a low affinity.

As used herein, the term “orthogonal” refers to targets that arenon-overlapping, uncorrelated, or independent. For example, if twoorthogonal Cas9 isoforms were utilized, they would employ orthogonalsgRNAs that only program one of the Cas9 isoforms for DNA recognitionand cleavage (Esvelt, et al. 2013 (PMID 24076762)). For example, thiswould allow one Cas9 isoform (e.g. S. pyogenes Cas9 or spCas9) tofunction as a nuclease programmed by a sgRNA that may be specific to it,and another Cas9 isoform (e.g. N. meningitidis Cas9 or nmCas9) tooperate as a nuclease dead Cas9 that provides DNA targeting to a bindingsite through its PAM specificity and orthogonal sgRNA. Other Cas9sinclude S. aureus Cas9 or SaCas9 and A. naeslundii Cas9 or AnCas9.

The term “truncated” as used herein, when used in reference to either apolynucleotide sequence or an amino acid sequence means that at least aportion of the wild type sequence may be absent. In some cases truncatedguide sequences within the sgRNA or crRNA may improve the editingprecision of Cas9 (Fu, et al. 2014 (PMID 24463574)).

The term “base pairs” as used herein, refer to specific nucleobases(also termed nitrogenous bases), that are the building blocks ofnucleotide sequences that form a primary structure of both DNA and RNA.Double stranded DNA may be characterized by specific hydrogen bondingpatterns, base pairs may include, but are not limited to,guanine-cytosine and adenine-thymine) base pairs.

The term “genomic locus” or “target gene” as used herein, refers to anypre-determined nucleotide sequence capable of binding to a Cas9 proteincontemplated herein. The target may include, but may be not limited to,a nucleotide sequence complementary to a programmable DNA binding domainor an orthogonal Cas9 protein programmed with its own guide RNA, anucleotide sequence complementary to a single guide RNA, a protospaceradjacent motif recognition sequence, an on-target binding sequence andan off-target binding sequence.

The term “on-target binding sequence” as used herein, refers to asubsequence of a specific genomic target that may be completelycomplementary to a programmable DNA binding domain and/or a single guideRNA sequence.

The term “off-target binding sequence” as used herein, refers to asubsequence of a specific genomic target that may be partiallycomplementary to a programmable DNA binding domain and/or a single guideRNA sequence.

The term “cleavage” or “break” as used herein, may be defined as thegeneration of a break in the DNA. This could be either a single-strandedbreak or a double-stranded break depending on the type of nuclease thatmay be employed.

As used herein, the term “edit”, “editing” or “edited” refers to amethod of altering a nucleic acid sequence of a polynucleotide (e.g.,for example, a wild type naturally occurring nucleic acid sequence or amutated naturally occurring sequence) by selective deletion of aspecific genomic target or the specific inclusion of new sequencethrough the use of an exogenously supplied DNA template. Such a specificgenomic target includes, but may be not limited to, a chromosomalregion, mitochondrial DNA, a gene, a promoter, an open reading frame orany nucleic acid sequence.

The term “delete”, “deleted”, “deleting” or “deletion” as used herein,may be defined as a change in either nucleotide or amino acid sequencein which one or more nucleotides or amino acid residues, respectively,are, or become, absent.

As used herein, the terms “complementary” or “complementarity” are usedin reference to “polynucleotides” and “oligonucleotides” (which areinterchangeable terms that refer to a sequence of nucleotides) relatedby the base-pairing rules. For example, the sequence “C-A-G-T,” may becomplementary to the sequence “A-C-T-G.” Complementarity can be“partial” or “total.” “Partial” complementarity may be where one or morenucleic acid bases may be not matched according to the base pairingrules. “Total” or “complete” complementarity between nucleic acids maybe where each and every nucleic acid base may be matched with anotherbase under the base pairing rules. The degree of complementarity betweennucleic acid strands has significant effects on the efficiency andstrength of hybridization between nucleic acid strands. This may be ofparticular importance in amplification reactions, as well as detectionmethods which depend upon binding between nucleic acids.

The terms “homology” and “homologous” as used herein in reference tonucleotide sequences refer to a degree of complementarity with othernucleotide sequences. There may be partial homology or complete homology(i.e., identity). A nucleotide sequence which may be partiallycomplementary, i.e., “substantially homologous,” to a nucleic acidsequence may be one that at least partially inhibits a completelycomplementary sequence from hybridizing to a target nucleic acidsequence. The inhibition of hybridization of the completelycomplementary sequence to the target sequence may be examined using ahybridization assay (Southern or Northern blot, solution hybridizationand the like) under conditions of low stringency. A substantiallyhomologous sequence or probe will compete for and inhibit the binding(i.e., the hybridization) of a completely homologous sequence to atarget sequence under conditions of low stringency. This may be not tosay that conditions of low stringency are such that non-specific bindingmay be permitted; low stringency conditions require that the binding oftwo sequences to one another be a specific (i.e., selective)interaction. The absence of non-specific binding may be tested by theuse of a second target sequence which lacks even a partial degree ofcomplementarity (e.g., less than about 30% identity); in the absence ofnon-specific binding the probe will not hybridize to the secondnon-complementary target.

The terms “homology” and “homologous” as used herein in reference toamino acid sequences refer to the degree of identity of the primarystructure between two amino acid sequences. Such a degree of identitymay be detected in a portion of each amino acid sequence, or along theentire length of the amino acid sequence. Two or more amino acidsequences that are “substantially homologous” may have at least 50%identity, preferably at least 75% identity, more preferably at least 85%identity, most preferably at least 95%, or 100% identity.

An oligonucleotide sequence which may be a “homolog” may be definedherein as an oligonucleotide sequence which exhibits greater than orequal to 50% identity to a sequence, when sequences having a length of100 bp or larger are compared.

As used herein, the term “gene” means the deoxyribonucleotide sequencescomprising the coding region of a structural gene and includingsequences located adjacent to the coding region on both the 5′ and 3′ends for a distance of about 1 kb on either end such that the genecorresponds to the length of the full-length mRNA. The sequences whichare located 5′ of the coding region and which are present on the mRNAare referred to as 5′ non-translated sequences. The sequences which arelocated 3′ or downstream of the coding region and which are present onthe rnRNA are referred to as 3′ non-translated sequences. The term“gene” encompasses both cDNA and genomic forms of a gene. A genomic formor clone of a gene contains the coding region interrupted withnon-coding sequences termed “introns” or “intervening regions” or“intervening sequences.” Introns are segments of a gene which aretranscribed into heterogeneous nuclear RNA (hnRNA); introns may containregulatory elements such as enhancers. Introns are removed or “splicedout” from the nuclear or primary transcript; introns therefore areabsent in the messenger RNA (mRNA) transcript. The mRNA functions duringtranslation to specify the sequence or order of amino acids in a nascentpolypeptide.

The term “gene of interest” as used herein, refers to any pre-determinedgene for which deletion may be desired.

The term “allele” as used herein, refers to any one of a number ofalternative forms of the same gene or same genetic locus.

The term “protein” as used herein, refers to any of numerous naturallyoccurring extremely complex substances (as an enzyme or antibody) thatconsist of amino acid residues joined by peptide bonds, contain theelements carbon, hydrogen, nitrogen, oxygen, usually sulfur. In general,a protein comprises amino acids having an order of magnitude within thehundreds.

The term “peptide” as used herein, refers to any of various amides thatare derived from two or more amino acids by combination of the aminogroup of one acid with the carboxyl group of another and are usuallyobtained by partial hydrolysis of proteins. In general, a peptidecomprises amino acids having an order of magnitude within the tens.

The term “polypeptide”, refers to any of various amides that are derivedfrom two or more amino acids by combination of the amino group of oneacid with the carboxyl group of another and are usually obtained bypartial hydrolysis of proteins. In general, a polypeptide comprisesamino acids having an order of magnitude within the tens or larger.

“Nucleic acid sequence” and “nucleotide sequence” as used herein referto an oligonucleotide or polynucleotide, and fragments or portionsthereof, and to DNA or RNA of genomic or synthetic origin which may besingle- or double-stranded, and represent the sense or antisense strand.

The term “an isolated nucleic acid”, as used herein, refers to anynucleic acid molecule that has been removed from its natural state(e.g., removed from a cell and may be, in a preferred embodiment, freeof other genomic nucleic acid).

The terms “amino acid sequence” and “polypeptide sequence” as usedherein, are interchangeable and to refer to a sequence of amino acids.

As used herein the term “portion” when in reference to a protein (as in“a portion of a given protein”) refers to fragments of that protein. Thefragments may range in size from four amino acid residues to the entireamino acid sequence minus one amino acid.

The term “portion” when used in reference to a nucleotide sequencerefers to fragments of that nucleotide sequence. The fragments may rangein size from 5 nucleotide residues to the entire nucleotide sequenceminus one nucleic acid residue.

As used herein, the term “hybridization” may be used in reference to thepairing of complementary nucleic acids using any process by which astrand of nucleic acid joins with a complementary strand through basepairing to form a hybridization complex. Hybridization and the strengthof hybridization (i.e., the strength of the association between thenucleic acids) may be impacted by such factors as the degree ofcomplementarity between the nucleic acids, stringency of the conditionsinvolved, the T. of the formed hybrid, and the G:C ratio within thenucleic acids.

As used herein the term “hybridization complex” refers to a complexformed between two nucleic acid sequences by virtue of the formation ofhydrogen bounds between complementary G and C bases and betweencomplementary A and T bases; these hydrogen bonds may be furtherstabilized by base stacking interactions. The two complementary nucleicacid sequences hydrogen bond in an antiparallel configuration. Ahybridization complex may be formed in solution (e.g., C₀ t or R₀ tanalysis) or between one nucleic acid sequence present in solution andanother nucleic acid sequence immobilized to a solid support (e.g., anylon membrane or a nitrocellulose filter as employed in Southern andNorthern blotting, dot blotting or a glass slide as employed in in situhybridization, including FISH (fluorescent in situ hybridization)).

As used herein, the term “T_(m)” may be used in reference to the“melting temperature.” The melting temperature may be the temperature atwhich a population of double-stranded nucleic acid molecules becomeshalf dissociated into single strands. As indicated by standardreferences, a simple estimate of the T. value may be calculated by theequation: T_(m)=81.5+0.41 (% G+C), when a nucleic acid may be in aqueoussolution at 1M NaCl. Anderson et al., “Quantitative FilterHybridization” In: Nucleic Acid Hybridization (1985). More sophisticatedcomputations take structural, as well as sequence characteristics, intoaccount for the calculation of T_(m).

As used herein the term “stringency” may be used in reference to theconditions of temperature, ionic strength, and the presence of othercompounds such as organic solvents, under which nucleic acidhybridizations are conducted. “Stringency” typically occurs in a rangefrom about T_(m) to about 20° C. to 25° C. below T_(m). A “stringenthybridization” can be used to identify or detect identicalpolynucleotide sequences or to identify or detect similar or relatedpolynucleotide sequences. For example, when fragments are employed inhybridization reactions under stringent conditions the hybridization offragments which contain unique sequences (i.e., regions which are eithernon-homologous to or which contain less than about 50% homology orcomplementarity) are favored. Alternatively, when conditions of “weak”or “low” stringency are used hybridization may occur with nucleic acidsthat are derived from organisms that are genetically diverse (i.e., forexample, the frequency of complementary sequences may be usually lowbetween such organisms).

As used herein, the terms “restriction endonucleases” and “restrictionenzymes” refer to bacterial enzymes, each of which cut double-strandedDNA at or near a specific nucleotide sequence.

DNA molecules are said to have “5′ ends” and “3′ ends” becausemononucleotides are reacted to make oligonucleotides in a manner suchthat the 5′ phosphate of one mononucleotide pentose ring may be attachedto the 3′ oxygen of its neighbor in one direction via a phosphodiesterlinkage. Therefore, an end of an oligonucleotide may be referred to asthe “5′ end” if its 5′ phosphate may be not linked to the 3′ oxygen of amononucleotide pentose ring. An end of an oligonucleotide may bereferred to as the “3′ end” if its 3′ oxygen may be not linked to a 5′phosphate of another mononucleotide pentose ring. As used herein, anucleic acid sequence, even if internal to a larger oligonucleotide,also may be said to have 5′ and 3′ ends. In either a linear or circularDNA molecule, discrete elements are referred to as being “upstream” or5′ of the “downstream” or 3′ elements. This terminology reflects thefact that transcription proceeds in a 5′ to 3′ fashion along the DNAstrand. The promoter and enhancer elements which direct transcription ofa linked gene are generally located 5′ or upstream of the coding region.However, enhancer elements can exert their effect even when located 3′of the promoter element and the coding region. Transcription terminationand polyadenylation signals are located 3′ or downstream of the codingregion.

As used herein, the term “an oligonucleotide having a nucleotidesequence encoding a gene” means a nucleic acid sequence comprising thecoding region of a gene, i.e. the nucleic acid sequence which encodes agene product. The coding region may be present in a cDNA, genomic DNA orRNA form. When present in a DNA form, the oligonucleotide may besingle-stranded (i.e., the sense strand) or double-stranded. Suitablecontrol elements such as enhancers/promoters, splice junctions,polyadenylation signals, etc. may be placed in close proximity to thecoding region of the gene if needed to permit proper initiation oftranscription and/or correct processing of the primary RNA transcript.Alternatively, the coding region utilized in the expression vectors ofthe present invention may contain endogenous enhancers/promoters, splicejunctions, intervening sequences, polyadenylation signals, etc. or acombination of both endogenous and exogenous control elements.

As used herein, the terms “nucleic acid molecule encoding”, “DNAsequence encoding,” and “DNA encoding” refer to the order or sequence ofdeoxyribonucleotides along a strand of deoxyribonucleic acid. The orderof these deoxyribonucleotides determines the order of amino acids alongthe polypeptide (protein) chain. The DNA sequence thus codes for theamino acid sequence.

The term “bind”, “binding”, or “bound” as used herein, includes anyphysical attachment or close association, which may be permanent ortemporary. Generally, an interaction of hydrogen bonding, hydrophobicforces, van der Waals forces, covalent and ionic bonding etc.,facilitates physical attachment between the molecule of interest and theanalyte being measuring. The “binding” interaction may be brief as inthe situation where binding causes a chemical reaction to occur. Thatmay be typical when the binding component may be an enzyme and theanalyte may be a substrate for the enzyme. Reactions resulting fromcontact between the binding agent and the analyte are also within thedefinition of binding for the purposes of the present invention.

BRIEF DESCRIPTION OF THE FIGURES

The file of this patent contains at least one drawing executed in color.Copies of this patent with color drawings will be provided by the Patentand Trademark Office upon request and payment of the necessary fee.

FIG. 1 shows an overview of several naturally occurring mutagenic DNArepair pathways. Microhomology Mediated End Joining (MMEJ) mediatedrepair is the center pathway involving a 5′ end resection and annealingof homologous sequences.

FIG. 2 presents exemplary sequences of microduplication targets within aTCAP gene and a HSP1 gene. The microduplicated sequences in each exonare highlighted in green and yellow. The target sites for SpCas9 areshown below each sequence with the GG PAM element in bold and the spacer(guide) sequence underlined. The position of the DSB in the sequence isindicated by a “⋅”.

FIG. 3 presents exemplary data of a TIDE analysis of sanger sequencechromatogram from SpCas9 treated LGMD2G iPSCs. The estimated mutagenesisrate is 87% with 53% of the alleles containing an 8 bp deletion (oval),which would be consistent with a reversion to the WT sequence.

FIG. 4 presents exemplary data of an ICE analysis of sanger sequencechromatogram from SpCas9 treated HPS1 B-EBV cells. The estimatedmutagenesis rate is 41% with 30% of the alleles containing a 16 bpdeletion (oval), which would be consistent with a reversion to the WTsequence.

FIG. 5 presents exemplary data of a deep sequencing analysis of theediting rates and outcomes at TCAP in iPSCs or iPSC derived myoblaststhat have an 8 bp duplication in both alleles. More than 50% of thealleles are either the precise 8 bp deletion or are mutations thatproduce an in frame sequence.

FIG. 6 presents exemplary data of TIDE analyses of sanger sequencing ofindividual iPSC clones from the 8 bp duplication TCAP line followingtreatment with a TCAP targeting SpCas9 RNP. In these four instances atleast one allele was converted to the wild-type sequence.

FIG. 7 presents exemplary data for a phenotype prediction iPSC clonesbased on the genotypes that were observed from the sequencing of theCas9 modified alleles. 75% of the clones that could be conclusivelycharacterized contained at least one wild-type TCAP sequence.

FIG. 8 presents an overview of duplicated repeat collapse byMMEJ-mediated DNA repair pathways. A nuclease targeted near the centerof the duplicated segment can lead to the collapse of the duplication.As disclosed herein, the present method targets this collapse andrestores the wild-type sequence.

FIG. 9 presents exemplary data showing that a mutant HPS1 allelecontains a 16 bp duplication (annotation as in FIG. 2).

FIG. 10 shows that MMEJ-based repair efficiently and precisely correctsTCAP allele containing an 8-bp duplication.

-   -   FIG. 10A: Schematic of MMEJ-based pathway for repair of a        microduplication. A DSB at the centre of a microduplication (in        red and blue) is expected to initiate 5′ end resection to expose        the microhomologies on each side. These repeats anneal with each        other and are repaired via the MMEJ pathway to yield the        wild-type (WT) sequence.    -   FIG. 10B: The pathogenic 8-bp microduplication within TCAP (bold        red and blue text) with the SpyCas9 protospacer-adjacent motif        (PAM) sequence in the magenta box and the protospacer sequence        underlined. A SpyCas9-induced DSB (magenta carets) is expected        to drive MMEJ repair to revert the mutant allele to the        wild-type sequence (half red/half blue text).    -   FIG. 10C: Percentage of 8-bp deletion (green bars) and total        indels (blue bars) resulting from SpyCas9 RNP treatment of        LGMD2G iPSCs homozygous for the 8-bp microduplication or        wild-type iPSCs. Bars denote mean and dots indicate individual        data points. n=3 biological replicates.    -   FIG. 10D: Genotype analysis of 22 LGMD2G iPSC clones after        treatment with SpyCas9 RNPs. Hypo, hypomorphic allele.    -   FIG. 10E: Percentage of 8-bp deletion (green bars) and total        indels (blue bars) resulting from SpyCas9 treatment of myoblasts        derived from patient-derived LGMD2G iPSCs. Bars denote mean and        dots indicate individual data points. n=3 biological replicates.

FIG. 11 shows that MMEJ-based repair efficiently and precisely correctsHPS1 allele containing 16-bp microduplication.

-   -   FIG. 11A: The 16-bp microduplication repeats are shown in bold        red and blue text. For six SpyCas9 guides targeting the        microduplication, the PAM sequence is demarcated in the magenta        box and the protospacer sequence is underlined. A DSB (magenta        carets with distance from the repeat centre indicated) is        expected to drive reversion to the wild-type sequence (half        red/half blue text). Sequence underlined with red and blue bold        lines in target site 6 indicates an alternate 16-bp        microhomology within this repeat.    -   FIG. 11B: Percentage of 16-bp deletions (green) and total indels        (blue) for guides shown in a based on UMI-based Illumina        sequencing. Bars denote mean and dots indicate individual data        points. n=3 biological replicates.    -   FIG. 11C: Percentage of wild-type reverted alleles (16-bp        deletion) among all alleles with insertions or deletions        (indels) from b. Mean±s.e.m., dots indicate individual data        points. n=3 biological replicates.

FIG. 12 presents exemplary data showing that PARP-1 inhibition decreasesefficiency of MMEJ-based repair.

-   -   FIG. 12A: Experimental design. HPS1 B-LCL cells were treated        with rucaparib 24 h before and after electroporation with        SpyCas9 RNPs targeting the HPS1 locus and collected for        subsequent UMI-based Illumina sequencing¹⁵.    -   FIG. 12B: Percentage of microhomology (MH)-mediated deletion        (green) and total indels (blue) in cells treated with SpyCas9 in        the presence of 0, 10 or 20 μM rucaparib, measured by UMI-based        Illumina deep sequencing. Bars denote mean and dots indicate        individual data points. n=3 biological replicates.    -   FIG. 12C: Percentage of microhomology-mediated deletion alleles        among all other alleles with indels from FIG. 12B. Mean±s.e.m.,        dots indicate individual data points. n=3 biological replicates.        ****P=0.00003, unpaired two tailed t-test.    -   FIG. 12D: Left, alignment of resulting sequences observed by        Illumina sequencing upon SpyCas9 RNP treatment of HPS1 B-LCL        cells. Right, heatmap showing percentage of alleles generated by        SpyCas9 for cells exposed to 0, 10 or 20 μM rucaparib. Gradient        scale indicates the percentage occurrence of that sequence.

FIG. 13 presents exemplary data showing that MMEJ-based approachefficiently achieves precise collapse of endogenous microduplicationsacross various repeat lengths.

-   -   FIG. 13A: Non-pathogenic endogenous microduplications ranging in        size from 4 bp to 36 bp. Microduplication repeats are shown as        bold red and blue text. The SpyCas9 PAM sequence is shown in the        magenta box and the LbaCas12a PAM sequence is shown in the        orange box. Anticipated DSBs produced by SpyCas9 and LbaCas12a        are denoted by magenta and orange carets, respectively.    -   FIG. 13B: Percentage of microhomology-mediated deletion (green)        and total indels (blue) produced at each endogenous site        following SpyCas9 treatment, calculated using UMI-based Illumina        sequencing. Bars denote mean and dots indicate individual data        points. n=3 biological replicates.    -   FIG. 13C: Percentage of microhomology-mediated deletions (green)        and total indels (blue) produced at three endogenous sites when        treated with SpyCas9 or LbaCas12a. Bars denote mean and dots        indicate individual data points. n=3 biological replicates.

FIG. 14 show the disease-causing GATA microduplication (red-blue tandemsegment) in the Tay-Sachs HEXA gene.

FIG. 15 presents exemplary data showing indel/deletion ratios subsequentto HEXA gene editing with Cas12a RNPs in a Tay-Sachs patient-derivedB-EBV cells with a Cas9:sgRNA concentration ratio of 60 pmol protein to120 pmol guide RNA.

FIG. 16 presents exemplary data showing representative sequencechromatograms of the edited HEXA genes, where sequencing is on thecomplementary strand.

FIG. 17 presents exemplary data showing indel/deletion ratios subsequentto HEXA gene editing with Cas12a RNPs in a Tay-Sachs patient-derivedB-EBV cells with a Cas9:sgRNA concentration ratio of 90 pmol protein to180 pmol guide RNA.

FIG. 18 presentes exemplary data showing the effect of rucaparib on theprofile of microhomology-mediated deletion products at AAVS1 locus inpatient-derived HPS1 B-LCL cells.

-   -   FIG. 18A: Schematic of two prominent DNA double-strand break        repair pathways. A DSB can be repaired through various pathways        that produce different DNA sequence end-products. The NHEJ        pathway is the dominant DSB repair pathway in most cells. The        MMEJ pathway uses end-resection to discover small homologies on        each side of the break that can be used to template the fusion        of the broken ends. PARP-1 regulates DSB flux through the MMEJ        pathway. Treatment of cells with rucaparib—an inhibitor of        PARP-1—attenuates DSB flux down the MMEJ repair pathway.    -   FIG. 18B: Percentage of microhomology-mediated deletions (green)        and total indels (blue) resulting from SpyCas9 treatment of        cells in the presence of 0, 10 and 20 μM rucaparib. Bars show        mean and dots show individual data points from three biological        replicates based on UMI-based Illumina deep sequencing.    -   FIG. 18C: Percentage of 1-bp insertions (purple), microhomology        mediated deletions (green) and other deletions (grey) produced        by SpyCas9 RNP with a sgRNA targeting the AAVS1 locus with the        addition of increasing amounts of rucaparib. Mean±s.e.m. from        three biological replicates based on UMI-based Illumina deep        sequencing.    -   FIG. 18D: Percentage of microhomology-mediated deletions out of        total indels in cells treated with SpyCas9 in the presence of        rucaparib. Mean±s.e.m., dots represent individual data points        from three biological replicates. P values determined using        two-tailed unpaired t-test. ***P=0.0004, ****P=6.5×10⁻⁷.    -   FIG. 18E: Left, alignment of allele sequences obtained from deep        sequencing analysis from samples treated with SpyCas9 RNP in the        presence of different rucaparib concentrations. Microhomologies        present at the AAVS1 locus are shown in by red, green and blue.        Microhomology-mediated deletion is indicated by two-toned text.        Magenta carets indicate site of DSB created by SpyCas9. Inserted        bases (ins) are shown in purple, deleted bases (del) are shown        as black dashes. Right, heatmap depicting the percentage of        alleles generated after SpyCas9 treatment of cells in the        presence of different concentrations of rucaparib (0, 10 or 20        μM). The blue colour gradient scale indicates the percentage of        occurrence of that sequence. Heatmap represents mean values from        a total of three independent biological replicates.

FIG. 19 presents exemplary data showing gene editing with SpyCas9 andLbCas12a at endogenous microduplications.

-   -   FIG. 19A: Percentage of microhomology-mediated deletions out of        total indels at endogenous sites in cells treated with SpyCas9        and LbaCas12a. Mean±s.e.m., dots represent individual data        points from three biological replicates.    -   FIG. 19B: Schematic of endogenous site containing a 24-bp        microduplication for SpyCas9 target sites 1-3. The 24-bp        microduplication repeats are shown in bold red and blue. The PAM        sequence is outlined in magenta and the protospacer sequence is        underlined. Magenta carets indicate the site of DSB.    -   FIG. 19C: Percentage of alleles with 24-bp deletion (green) and        total indels (blue) for all three guides from TIDE analysis.        Guide 3 produces primarily 23-bp deletions, but not 24-bp        deletions, probably because it recuts the collapsed DNA        sequence. Bars shows the mean from n=3 biological repeats,        individual data points are represented by dots.    -   FIG. 19D: Proportion of the 24-bp deletion out of total indels        as individual data points (dots), with mean±s.e.m. n=3        biological repeats.    -   FIG. 19E: Schematic of endogenous site containing a 27-bp        microduplication for SpyCas9 target sites 1 and 2.    -   FIG. 19F: Percentage of alleles with 27-bp deletion (green) and        total indels (blue) for both guides from UMI-based Illumina deep        sequencing. Bars show the mean from n=3 biological repeats,        individual data points are represented by dots.    -   FIG. 19G: Proportion of the 27-bp deletion out of total indels        as individual data points (dots) with mean±s.e.m. n=3 biological        replicates.

FIG. 20 presents exemplary data showing indel populations resulting fromSpyCas9 editing at the TCAP locus.

-   -   FIG. 20A: Indel percentages resulting from SpyCas9 RNP treatment        in patient-derived iPSCs homozygous for the 8-bp        microduplication or in wild-type iPSCs. Mean±s.e.m. from three        biological replicates.    -   FIG. 20B: Breakdown of indel classes resulting from SpyCas9        treatment of myoblasts derived from patient-derived LGMD2G        iPSCs. Mean±s.e.m. from three biological replicates.    -   FIG. 20C: Sequence alignment of the edited alleles resulting        from SpyCas9 RNP treatment of LGMD2G iPSCs. Red and blue text        indicates DNA repeats that constitute the microduplication, and        collapse is indicated by half red and half blue text. Dashes        indicate deleted bases and purple text indicates inserted bases.        Data are from one biological replicate out of three independent        biological replicates.    -   FIG. 20D: Sequence alignment of the edited alleles resulting        from SpyCas9 RNP treatment of myoblasts derived from        patient-derived LGMD2G iPSCs. Data are from one biological        replicate out of three independent biological replicates.

FIG. 21 presents exemplary data showing PacBio long-read sequencinganalysis for SpyCas9-edited LGMD2G iPSCs at the TCAP locus.

-   -   FIG. 21A: Percentage of gene modification observed from PacBio        sequencing (one replicate from FIG. 10C out of three biological        replicates). Green, alleles containing the 8-bp deletion; grey,        other small indels (<100 bp); blue, large insertions (0.14%, not        visible on the graph); maroon, large deletions (>100 bp).    -   FIG. 21B: IGV graphs depicting representative reads obtained for        unedited (top) and edited (bottom) LGMD2G iPSCs, spanning a        genomic region of about 2,035 bp surrounding the TCAP target        site. Red carat indicates the 8-bp deletion site. Data represent        one replicate out of three independent biological replicates.

FIG. 22 presents exemplary data showing PacBio long-read sequencinganalysis of SpyCas9-edited LGMD2G iPSCs clones and a complex colony atthe TCAP locus. IGV graphs depicting representative reads obtained forclonal isolates of edited LGMD2G iPSCs (FIG. 10D), spanning a genomicregion of about 2,035 bp surrounding the TCAP target site. The genotypeof the clones (deduced by Illumina deep sequencing) is indicated besidean enlargement of the TCAP target region within the PacBio data. Thesequences of the two alleles (listed above the IGV plot) obtained fromsequencing are shown with repeats in red and blue. Alleles that revertedto wild-type as a result of collapse of microduplication are halfred/half blue. Bottom, IGV plot for one complex iPSC colony that appearsto have been nucleated by more than one cell, with large deletionspresent in the genome (sizes indicated).

FIG. 23 presents exemplary data showing Detection of telethoninexpression by flow cytometry in patient-derived cells treated withSpyCas9.

-   -   FIG. 23A: Contour plots from a representative flow cytometry        assay to detect telethonin expression in healthy control cells        (TCAP^(+/+)), patient cells (TCAP^(−/−)), and SpyCas9-treated        homozygous and heterozygous iPS clone-derived myoblasts        differentiated for 10 days in culture. Plots are representative        of three independent replicates.    -   FIG. 23B: Histograms from a representative flow cytometry assay        to detect telethonin expression. Left, overlay of        anti-telethonin antibody staining for four representative        samples for different TCAP genotypes. Right, comparison between        patient cells and healthy control cells, and SpyCas9-treated        homozygous and heterozygous iPS clone-derived myoblasts        differentiated for 10 days in culture. Histograms are        representative of three independent replicates.    -   FIG. 23C: Cells were selected by removing cell debris first as        shown by gate P1, and then single cells were selected from P1 by        removing clustered cells as shown by gate P2. The cells in gate        P2 were used for flow analysis. Plots are representative of one        biological replicate.    -   FIG. 23D: Average percentage of telethonin-expressing cells from        two technical replicates of three biological replicates. Error        bars indicate s.e.m (n=6) and circles represent individual data        points. P values (0.33 for patient versus heterozygous and 0.04        for patient versus homozygous clones) were calculated by        two-sided Student's t-test.    -   FIG. 23E: Western blot showing validation of anti-telethonin        antibody (Santa Cruz Biotechnology). Human muscle lysate and        lysate from HEK293T cells transfected with        haemagglutinin-tagged-telethonin expression construct were        separated on an SDS 4-12% acrylamide gradient gel and the        resulting blot was probed with anti-telethonin antibody.

FIG. 24 presents exemplary data showing a standard curve generated withgenomic DNA of wild-type and HPS1 mutant B-LCLs from UMI-based Illuminadeep sequencing. Genomic DNA from wild-type cells and HPS1 cellshomozygous for the 16-bp microduplication were mixed at different ratios(x-axis). These mixed DNAs were used for the construction of a UMI-basedIllumina library to determine the ratio of the alleles through deepsequencing (y-axis). These data are fitted to a regression line with theR2 value reported. n=1 biological replicate.

FIG. 25 presents exemplary data showing an Indel spectrum generated bySpyCas9 editing at the HPS1 locus in HPS1 B-LCL cells. Indel spectra ofSpyCas9 nuclease cells treated with different sgRNAs determined byUMI-based Illumina deep sequencing. Red bar indicates 16-bp deletionthat corresponds to the deletion of one of the microduplication repeats.Data show indel spectra from one representative biological replicate outof three independent biological replicates.

-   -   FIG. 25A: Target site 1.    -   FIG. 25B: Target site 2.    -   FIG. 25C: Target site 3.    -   FIG. 25D: Target site 4.    -   FIG. 25E: Target site 5.    -   FIG. 25F: Target site 6.

FIG. 26 presents exemplary data showing pathogenic microduplications andtheir prevalence in human populations.

-   -   FIG. 26A: Number of insertion variants of length>1 bp that are        annotated as Pathogenic or Pathogenic/likely pathogenic in        ClinVar. Variants are binned by length, with all those of length        40 bp or greater combined. The insertions (grey) are stratified        into progressively finer categories: duplications (red);        ‘simple’ duplications (described in text, orange); and the        subset of these observed at least once in gnomAD exome/genome        databases (green).    -   FIG. 26B: Number of insertion variants of length >1 bp that are        observed at least once in the ‘coding’ regions of the gnomAD        exome/genome databases. As above, insertions (grey) are        stratified into progressively finer categories: duplications        (red); ‘simple’ duplications (orange); the subset of these        listed in ClinVar (cyan); and the subset annotated as Pathogenic        or Pathogenic/likely pathogenic in ClinVar (green). Cyan and        green bars are not visible at this resolution.

FIG. 27 presents an exemplary bioinformatics pipeline for identificationof disease alleles. Schematic shows the bioinformatics pipeline used toidentify all microduplications amendable to efficient MMEJ-mediatedcollapse from the ‘coding’ regions (exome_calling_regions.v1; mainlyexons plus 50 flanking bases) in the gnomAD genome and exome databases(version 2.0.2). Insertion variants observed in both databases were usedfor analysis (variants occurring in both databases were counted once).Insertions that do not add a repeat-unit to an existing tandem repeatand are not themselves a perfect repeat were filtered to constrain onlyduplications that spanned 2-40 bp in length and are amendable toCRISPR-Cas9 targeting. This data set was then cross-referenced againstthe ClinVar database (clinvar_20180225. vcf) to apply further filtersfor variants reported as pathogenic, which ultimately yielded 143 likelydisease-causing microduplications.

DETAILED DESCRIPTION OF THE INVENTION

The present invention is directed to the field of gene therapy. Inparticular, compositions and methods are disclosed that repair genemicroduplication mutations by reversion to a wild type sequence. Forexample, the creation of double stranded breaks by a nuclease proteininduces the microhomology mediated end joining DNA repair pathway thatcorrects the microduplication back to the wild type sequence without theassistance of an exogenously supplied donor DNA.

In one embodiment, the present invention contempates a subset ofdisease-causing alleles within the human population that are the productof small duplications (microduplications of 1 to 40 base pairs) within agene sequence. These alleles occur in human subpopulations withsubstantial frequencies and result in rare diseases such as Limb GirdleMuscular Dystrophy 2G (LGMD2G) [Nigro, V. & Savarese, M. Genetic basisof limb-girdle muscular dystrophies: the 2014 update. Acta Myol 33,1-12(2014)], Tay-Sachs Disease[Fernandes Filho, J. A. & Shapiro, B. E.Tay-Sachs disease. Arch. Neurol. 61, 1466-1468 (2004).], andHermansky-Pudlak syndrome (HPS)[El-Chemaly, S. & Young, L. R.Hermansky-Pudlak Syndrome. Clin. Chest Med. 37, 505-511 (2016).] amongothers (Table 1).

In one embodiment, the present invention contemplates a methoddemonstrating that disease-causing microduplications can be reverted tothe wild-type sequence simply through the generation of a DSB near thecenter of the duplication, enabling development of simplified Cas9-basedtherapeutic interventions tailored to each disorder. Our discovery wasbased initially on the theoretical idea that a nuclease-generated DSBwould harness a common cellular DNA repair pathway—microhomologymediated end joining (MMEJ). Sfeir et al., “Microhomology-Mediated EndJoining: A Back-up Survival Mechanism or Dedicated Pathway?” TrendsBiochem Sci 40:701-714 (2015). MMEJ utilizes small regions of sequencehomology on each side of the break to collapse the DNA sequence. See,FIG. 8. The idea of microhomology collapse of sequences has beenobserved in programmable nuclease editing data for some time. Bae etal., “Microhomology-based choice of Cas9 nuclease target sites” NatureMethods 11:705-706 (2014). However, the realization that it could beapplied to disease correction to achieve highly efficient reversion tothe wild-type sequence has not been described.

The data presented herein demonstrates a successful, efficientcorrection of disease-causing alleles in patient-derived cell linesharboring microduplications including, but not limited to, TCAP (LGMD2G)and HPS1 (HPS). The data shows that this correction can be successfullyperformed in iPSC, stem cell progenitor cells and adult somatic cells,opening up multiple route for the delivery of a nuclease-based therapy.Based on a computational analysis of human allele variants describedherein, more than 100 diseases have been identified that should beamenable to this type of genetic correction. As the introduced nucleaseis programmed to target a mutant DNA sequence, the reverted wild-typesequence is not a substrate, and thus should be stable even in thepresence of the nuclease. Furthermore, microhomolgy-mediated correctiondoes not require a DNA cassette to regenerate the wild-type sequence,only the transient delivery of the nuclease (e.g. Cas9 and its sgRNA) totarget the locus. The demonstrated high rate of correction for these twodistinct genetic disorders suggests that our correction approach willhave broad application to a wide variety of important genetic disordersassociated with microduplications for which there are no therapeuticscurrently available, providing patients with a definitive cure.

Although it is not necessary to understand the mechanism of aninvention, it is believed that targeting a double strand break to amicroduplication can cause the collapse of the microduplication back tothe wild-type sequence with high efficiency and that this might be usedto correct disease alleles without the need for a DNA repair template.

Current programmable nuclease-based methods (for example, CRISPR-Cas9)for precise correction of a disease-causing genetic mutation harness thehomology-directed repair pathway. However, this repair process requiresco-delivery of an exogenous DNA donor to recode the sequence and can beinefficient in many cell types. In some embodiments, the presentinvention contemplates disease-causing frameshift mutations resultingfrom microduplications which can be efficiently reverted to thewild-type sequence simply by generating a double-stranded break near thecentre of the duplication. It has been demonstrated herein usingpatient-derived cell lines: for example, limb-girdle muscular dystrophy2G (LGMD2G)¹, Hermansky-Pudlak syndrome type 1 (HPS1)² and Tay-SachsDisease. Clonal analysis of inducible pluripotent stem cells (iPSCs)from the LGMD2G cell line, which contain a mutation in TCAP, treatedwith the Streptococcus pyogenes Cas9 (SpyCas9) nuclease revealed thatabout 80% contained at least one wild-type TCAP allele; this correctionalso restored TCAP expression in LGMD2G iPSC-derived myotubes. SpyCas9also efficiently corrected the genotype of an HPS1 patient-derivedB-lymphoblastoid cell line. Inhibition of polyADP-ribose polymerase 1(PARP-1) suppressed the nuclease-mediated collapse of themicroduplication to the wild-type sequence, confirming that precisecorrection is mediated by the microhomology-mediated end joining (MMEJ)pathway. Analysis of editing by SpyCas9 and Lachnospiraceae bacteriumND2006 Cas12a (LbaCas12a) at non-pathogenic 4-36-base pairmicroduplications within the genome indicates that the correctionstrategy is broadly applicable to a wide range of microduplicationlengths and can be initiated by a variety of nucleases. Finally,LbaCas12a was employed to achieve precise correction of the four basepair duplication in HEXA Tay-Sachs patient-derived B-lymphoblastoid cellline. The simplicity, reliability and efficacy of this MMEJ-basedtherapeutic strategy should permit the development of nuclease-basedgene correction therapies for a variety of diseases that are associatedwith microduplications.

I. Double Stranded Deoxyribonucleotide Break Repair Mechanisms

MMEJ is an error-prone double-stranded break (DSB) DNA repair pathwaythat uses regions of microhomology (2-25 bp) on each side of a DSB todefine the boundaries at which DNA segments are rejoined³. Thismutagenic process generates deletions that result in the loss of one ofthe repeat sequences and the intervening region. See FIGS. 10A and 18.Hallmarks of MMEJ repair on DNA products generated through editing ofprogrammable nucleases have been observed in a variety of cell types andtheir effect on gene inactivation rates has been appreciated^(4,5). TheMMEJ pathway has also been harnessed for the targeted insertion ofexogenous donor DNAs in mammalian cells and zebrafish and frogembryos^(6,7). Herein, a nuclease-based therapeutic approach isdescribed that harnesses the MMEJ pathway to precisely correctframeshift mutations resulting from microduplications (e.g., tandemduplications). It was reasoned that MMEJ-based repair of a programmablenuclease-induced DSB near the centre of a disease-causingmicroduplication would achieve precise reversion to the wild-typegenomic sequence. This strategy might be an effective alternative tohomology-directed repair-based gene correction approaches and would notrequire co-delivery of a donor DNA. Furthermore, the reverted wild-typesequence would no longer be complementary to the single-guide RNA(sgRNA) targeting the microduplication, leading to stable correctioneven in the presence of Cas9 nuclease.

To evaluate the efficacy of the presently contemplted MMEJ-basedcorrection strategy, LGMD2G and HPS1 were selected as exemplary diseasesthat affect different human tissues and whose causes include pathogenicmicroduplications of different lengths. Both of these diseases areautosomal recessive disorders that are represented at modest frequenciesin different human subpopulations and currently have no treatments. Oneof the disease alleles identified in LGMD2G patients features an 8-bpduplication in exon 1 of TCAP, a mutation that is found in the EastAsian population at a frequency of approximately 1 in 1,000 alleles.TCAP encodes the telethonin protein, a 19-kDa cardiac and striatedmuscle-specific structural protein located in the Z-disc of sarcomeresthat links titin proteins to stabilize the contractile apparatus formuscle contraction⁸. Homozygous or compound heterozygous inactivatingmutations in TCAP manifest as severe muscle atrophy and cardiomyopathythat typically develop during late adolescence into earlyadulthood^(1,9).

The double strand breaks (DSBs) that are generated within the genomes ofeukaryotic systems are potentially repaired by a number of differentDNA-damage response pathways such as canonical non-homologous endjoining (cNHEJ), homologous recombination (HR), and alternatenon-homologous end joining (aNHEJ). McVey et al., “MMEJ repair ofdouble-strand breaks (director's cut): deleted sequences and alternativeendings” Trends in Genetics 24:529-538 (2008). cNHEJ is a precise repairpathway where ends are rejoined and typically reconstitute the originalDNA sequence. HR uses a DNA template with homology to sequences flankingthe DSB to copy a homologous sequence to repair the broken site. aNHEJis a mutation prone process that utilizes resection of 5′ ends of theDSB to complete the repair. The Microhomology Mediated End Joining(MMEJ) pathway involves rejoining the DNA ends using short regions ofhomology on each side of the break (e.g., usually >2 bases) where theintervening sequence is deleted. See, FIG. 1.

When artificial nucleases are introduced into the cell to target thegenome, the DSBs that are generated are likely to proceed down the cNHEJpathway where they are precisely repaired, which restores the existingnuclease target sequence, whether wild type or mutated. Eventually,however, mutations are inevitably generated that disrupt the targetsite. Sequencing information on these deletions suggests that in manyinstances the resulting deletion mutations are generated by MMEJ, due tothe sequence scars that contain microhomologies that are are both sidesof the break. Analysis of Cas9 nuclease DNA target sequences suggeststhat there is a correlation between the efficiency of collapse and thelength of the microhomology on each side of the break. Bae et al.,“Microhomology-based choice of Cas9 nuclease target sites” NatureMethods 11:705-706 (2014).

DSBs at most genomic sites are repaired primarily through the NHEJpathway, which can produce small insertions or deletions duringimprecise repair (for example, AAVS1).²⁸ See, FIG. 18. The datapresented herein, which span DSBs in twelve sequences, indicate thatmicroduplications are preferentially repaired via the MMEJ pathway,which yields predictable and efficient collapse. For this class ofpathogenic mutations, precise repair via the MMEJ pathway provides afavourable alternative to homology-directed repair, which is inefficientin many cell types²⁹. Consistent with the present findings,MMEJ-mediated repair was recently used to efficiently correct thepathogenic microduplication associated with HPS1³⁰. Although the use ofallele frequencies from gnomAD can help to prioritize potential targetsfor MMEJ-based repair, this underestimates the extent of geneticdiseases—particularly dominant ones—caused by microduplications.

II. Microhomology Mediated End Joining Disease Mutation Repair

To test the generality of the presently contemplated MMEJ-based repairapproach and the range of sequence lengths over which duplicationcollapse is efficient, editing products generated by SpyCas9 targetingendogenous microduplications within the human genome were evaluated.

A bioinformatic analysis was peformed to identify non-pathogenic, uniqueendogenous microduplications ranging from 4 bp to 36 bp in length in thehuman genome. See, FIG. 13A. The efficiency of microduplication collapseresulting from a SpyCas9 produced DSB at the centre of themicroduplications in HEK 293T cells at these sites was examined.Although the bulk editing rate varied across these target sites, it wasconsistently found that duplication collapse was the major end-productwithin the edited alleles (ranging from 45% to 93%), regardless of themicroduplication length. See, FIG. 13B and FIG. 19A. Consistent with theanalysis at an HPS1 locus, a decrease in the duplication collapseefficiency was observed for 24-and 27-bp-long microduplications as cutsites were moved away from the centre. See, FIGS. 19B-19G.

Whereas SpyCas9 generates blunt DSBs, the type V CRISPR-Cas nucleaseCas12a generates DSBs with 5′ overhangs¹⁹. It was then investigatedwhether LbaCas12a-generated breaks might be preferentially repaired by aresection-dependent pathway such as MMEJ by comparing the efficiency ofmicroduplication collapse engendered by SpyCas9 and LbaCas12a nucleasesat three endogenous sites. Efficient repeat collapse (50-90% of editedalleles) could be achieved with LbaCas12a at all three of these sites,with efficiencies similar to those of SpyCas9. See, FIG. 13C and FIG.19A. In addition, LbaCas12a could drive repeat collapse for the fourbase pair duplication in HEXA that is associated with Tay-Sachs Disease.See, FIGS. 14-17. Overall, these data demonstrate that the MMEJ-basedediting approach can be used to efficiently collapse microduplicationsup to lengths of at least 36 bp using either Cas9 or Cas12a programmablenucleases.

Recently, an algorithm has been developed to more reliably predicttarget loci that would be predisposed to generate a more homogeneousmutant allele population through MMEJ. Ata et al., “Toward PrecisionMolecular Surgery: Robust, Selective Induction of Microhomology-mediatedEnd Joining in vivo” BioRxiv, (posted online Mar. 28, 2018). Thus, thegoal of this algorithm was to identify sites in genes where thegeneration of a double strand break (DSB) will be repaired through theuse of microhomologies on each side of the break to collapse the DNAsequence such that it is out-of-frame with regards to its translationand thus will not produce a functional protein. Termed the “MENTHU”algorithm, it appears primarily to be a way of post-processingpredictions generated from an earlier reported algorithm (Bae et al.,nature.com/articles/nmeth.3015 (2014)) to improve the prediction forwhen a DSB will be repaired by MMEJ in a fairly homogeneous way. This isuseful if one wants to do precision genome editing, whereas Bae et alwere considering the blunter application of making (any) out-of-framedeletions for gene knock-out.

The Ata et al. paper invites users to access this algorithm tofacilitate the scanning of reference wild-type genes using Genbank IDsor RefSeq IDs to identify sites that will collapse primarily through asingle MMEJ event down to a specific sequence. genesculpt. orghnenthu/.The Ata et al. algorithm is designed for the primary application ofmaking knockouts in model organisms, e.g. the source-code repository forMENTHU has the subtitle “MENTHU knockout site recommender”.Consequently, Ata et al. discloses making mutants in zebrafish embryosthrough the injection of a programmable nuclease (TALENs or SpCas9), andthen analyzing the resulting genetic products and phenotypes of thesemutant animals.

Although the MENTHU algorithm appears to be set up to analyze genes, inprinciple any DNA sequence can be evaluated, e.g. with variant allelesand flanking sequence, but this is user dependent—not a function of thealgorithm. In addition, most of the known pathogenic variant allelesthat are duplications cause frame-shifts, and the algorithm is not setup to define going from an out-of-frame sequence to an in-framesequence, let alone restoring the wild-type sequence.

In some embodiments, the present invention contemplates an alternativemethod that is focused on capturing abutting duplications within theExAC database or gnomAD databases—a database of variants identified inwhole-genome and whole-exome sequencing data aggregated from manylarge-scale projects (and subsuming the earlier ExAC exome-onlydatabase)—that may be suitable for MMEJ repair. Importantly, the basicrepresentation of variants in gnomAD lists the genomic position,reference (REF) sequence starting at that position, and alternate (ALT)sequence starting at that position; it is not typically readily apparentif a variant is a duplication, as typically only the base immediatelypreceding an insertion is used as the reference allele, whereas toestablish whether the inserted sequence is a duplication requiresexamining more of the flanking regions (e.g. the HEXA duplication hasREF=G, ALT=GGATA, where only a single copy of the duplication ispresent). See, Tables 5 and 6 Furthermore, the gnomAD webpages anddownloadable vcf files are not compatible with the MENTHU program intheir raw form: the webpages for variants show surrounding genomicreference sequence only as a PNG graphic (not in text form) or via linksto the UCSC genome browser for the reference genome; the vcf filessometime indicate when variants are duplications in the HGVSc fieldsadded by Ensembl VEP, but again these files do not directly providesequence of the duplication (both reference-copy and extra insertedcopy) together with enough genomic flanking sequence to use foridentifying cleavage sites that would be suitable for MMEJ. Thisalternative technology rebuilt the surrounding genomic sequence andidentified common positions for nuclease cleavage around theseduplications that could be tested to achieve collapse of the duplicationand restore the wild-type sequence. Never is this concept mentioned inthe MENTHU manuscript or algorithm.

In addition, the present invention—unlike the Ata et al. algorithmcaptures allele frequencies that allow the prioritization of potentialtargets based on the associated diseases, where the information onpathogenicity is extracted from the ClinVar database and combined withgnomAD and 1000 Genome Project phase 3 databases to determine how commonthe variants are overall and in specific human subpopulations.

Thus, the embodiments contemplated herein represent a completely novelanalysis of a human genome variant database to extract information ofdisease alleles that may be amenable to gene correction by replacingmicroduplication mutations sequences with their requisite wild typesequences via an MMEJ strategy.

III. Gene Microduplication Diseases

There are a number of diseases that have causative alleles within thehuman population that are associated with microduplications within thegenome. See, Table 1.

TABLE 1 Exemplary Disease alleles associated with microduplicationsgnomAD clin Var allele Disease locus duplication dbSNP ID ID frequencyLGMD2G TCAP CGAGGTGT rs778568339 ND 8.126e-5 HPS HPS1 CCAGCAGGGGAGGCCCrs281865163 5277 2.845e-5 Tay-Sachs HEXA GATA rs387906309 3889 0.0008041familial DOK7 GCCT rs764365793 1273 0.0006367 limb-girdle myastheniaCone-rod RAX2 CCCGGG rs549932754 1242 0.0007684 dystrophy 11There are likely to be many more microduplications that are associatedwith diseases. But most disease phenotypes have not been linked to aspecific microduplication. For example, ˜90% GWAS disease-associatedSNPs are found in non-coding sequences, though which variants arethemselves causal, as opposed to being in linkage disequilibrium withcausal varaints, is in many cases not yet known. Hindorff et al.,“Potential etiologic and functional implications of genome-wideassociation loci for human diseases and traits” Proceedings of theNational Academy of Sciences 106:9362-9367 (2009). In addition, repeatexpansion diseases (e.g., Huntington's disease, ALS [C9ORF72], etc.)could be thought of as extended microduplications as well.

In one embodiment, the present invention contemplates a method forreverting a gene comprising a nucleotide microduplication mutation to awild-type sequence. In one embodiment, the method comprises generating aDSB near the center of the nucleotide microduplication. In oneembodiment, the nucleotide microduplication causes a disease. In oneembodiment, the DSB is created by targeting a nuclease to the nucleotidemicroduplication center. In one embodiment, the nuclease includes, butis not limited to Cas9, CRISPR, Cas12 (Cpf1), zinc finger nucleasesand/or TALEN. Although it is not necessary to understand the mechanismof an invention, it is believed that since the nuclease is targeting amutated sequence, once the mutation reversion to a wild type sequencehas occurred, the repaired target sequence would no longer be recognizedby the nuclease, and thus remains a wild type sequence in the presenceof the repairing nuclease. It is further believed that a correction DNAcassette is not needed for an MMEJ repair back to the wild-typesequence, only the nuclease (e.g. Cas9) and a targeting moiety havingaffinity for the mutant locus (e.g. sgRNA).

The data presented herein describes successful correction of diseasecausing alleles in patient-derived cell lines harboringmicroduplications in TCAP and HPS1. A high rate of correction isachieved in patient cells lines through the delivery of a nucleasesuggesting that nuclease-induced MMEJ repair of microduplications withina genome can be programmed for other gene microduplication targetscausing other diseases (e.g. HEXA-Tay-Sachs syndrome, other diseases inTable 1) leading to cures for these diseases.

A. TCAP and HSP1 Microduplication Repair

The site-specific nuclease, S. pyogenes Cas9 (SpCas9) was used in thefollowing therapeutic gene editing method. Jinek et al., “A programmabledual-RNA-guided DNA endonuclease in adaptive bacterial immunity” Science337:816-821 (2012). Nonetheless, it is contemplated herein that similarresults can be obtained using any Cas9 (or CRISPR), a Cpf1 nuclease orany other programmable nuclease system including, but not limited to,zinc finger nucleotide (ZFN), TALEN, mega-TAL or meganuclease all ofwhich can be targeted to a gene microduplication sequence. The datapresented herein show two different proof-of-principle targets (TCAP andHSP1) that may have therapeutic value. Both of these diseases areassociated with substantial morbidity, and no curative therapies arecurrently available.

The mutant TCAP allele contains an 8 base duplication that leads to anout of frame coding sequence. UCSC:genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&highlight=hg19.chr17%3A37821635-37821635&position=chr17%3A37821610-37821660. This TCAPallele has a frequency of ˜1 in 1000 in the east Asian population.gnomad.broadinstitute.org/variant/17-37821635-G-GCGAGGTGT. Individualswith homozygous inactivating mutations in TCAP have Limb Girdle MuscularDystrophy 2G (LGMD2G).

The mutant HPS1 allele contains a 16 bp duplication that leads to an outof frame coding sequence.gnomad.broadinstitute.org/variant/10-100183554-T-TGGGCCTCCCCTGCTGG. Thisallele has a frequency of ˜0.1 in 21 in populations within Puerto Rico.MPH, S. E.-C. M. & MD, L. R. Y. Hermansky-Pudlak Syndrome 1-7 (2017).Individuals with homozygous inactivating mutations in HPS1 haveHermanksy-Pudlak syndrome (HPS1).

B-EBV cells were obtaned from a patient that contains a homozygous 16 bpmicroduplication in the HPS1 gene with SpCas9 by nucleofection. ICE(similar to TIDE¹⁶) analysis of sanger sequence chromatogram from SpCas9treated HPS1 B-EBV cells. The estimated mutagenesis rate is 41% with 30%of the alleles containing a 16 bp deletion (red arrow), which is areversion to the wild-type sequence. Zero (x-axis) is no change in thelength or composition of the 16 bp microduplication. See, FIG. 9.

Patient-derived cells were obtained to test the potential for a nucleasetargeting these deletions to revert the duplicated mutant allele. ForTCAP, iPSCs were derived from fibroblasts from an individual that is ahomozygous carrier of the 8 bp duplication. See, FIG. 3. For HPS1, B-EBVcells were purchased from Coriell generated from a patient that ishomozygous for the 16 bp duplication in HPS1 [GM14606]. For the nucleasesystem, a 3xNLS-SpCas9 protein was used complemented with sgRNAs(Synthego) for the target sequences. See, Table 2.

TABLE 2 Guide Sequences in sgRNAs targeting duplicationsMicroduplication locus SpCas9 Guide sequence TCAP AGCTGAGCTGCGAGGTGTCGHPS1 CAGCAGGGGAGGCCCCCAGCThese Cas9-sgRNA complexes (e.g., Cas9 RNPs) were delivered bynucleofection to LGMD2G iPSCs, myoblast derived from LGMD2G iPSCs, or tothe HPS1 B-EBV line. Following recovery and expansion of thenuclease-treated cells in culture the target genomic region wasamplified by PCR from the population of treated cells and the mutagenicproducts were characterized by TIDE or ICE analysis of the sequencechromatograms. Brinkman et al., “Easy quantitative assessment of genomeediting by sequence trace decomposition” Nucleic Acids Research (2014).TIDE analysis of Cas9 RNP treated LGMD2G iPSCs revealed that ˜53% of thealleles were converted back to a wild-type length. See, FIG. 3. ICE(Synthego) analysis of SpCas9 treated HPS1 B-EBV cells revealed that˜30% of the alleles were converted back to a wild-type length in thecontext of a ˜41% total editing rate. See, FIG. 4. In summary,approximately 75% of the edited HPS1 alleles were converted to wild-typelength.

To confirm the TIDE analysis of TCAP alleles a deep sequencing analysiswas performed on SpCas9 RNP treated iPSCs and iPSC derived myoblasts.See, FIG. 5. The iPSC data is from two biological replicates and themyoblast data is from a single experiment. The data confirm that themajority of the editing products are micro-homology mediated deletion ofone of the duplicate sequences. There are also a large number ofadditional sequence that have been shifted back in frame throughmutagenesis—although they are not the wild-type coding sequence. Thesemutant in-frame sequences may also be functional.

Individual iPSC clones were taken from a SpCas9 RNP treated TCAPmicroduplication cell population and expanded to determine the genotype.A variety of different genotypes were observed within the clones thatwere analyzed. See, FIG. 6. Two of the 24 clones that were analyzed werehomozygous for the wild type (WT) allele. Six “clones” could not bedefined conclusively—likely because the colonies were initiated from 2cells generating more than 2 allele sequences. Overall the analysis of18 clones demonstrate that the majority of iPSC clones (˜75%) contain atleast one corrected allele of wild type sequence. See, FIG. 7.

In one embodiment, the present invention contemplates a method ofdifferentiating SpCas9 treated iPSC clones into myoblasts to determinethe number that display expression of the TCAP encoded proteintelethonin.

The methods disclosed herein show an ability to precisely revert amicroduplication back to its parental (e.g., wild type) sequence thatcan correct genetic microduplication mutations underlying of a number ofdiseases. Aside from those listed in Table 1 (supra)—there may be anumber of diseases that stem from microduplications given the limiteddepth of genomic data that is associated with rare diseases.

An sgRNA was designed and tested for SpyCas9 to generate a DSB one basepair away from the middle of the TCAP 8-bp microduplication. See, FIG.10B. Purified SpyCas9 protein was complexed with a synthetic sgRNA (RNP)and electroporated into iPSCs homozygous for the TCAP microduplicationthat were derived from patients with LGMD2G. After four days, deepsequencing analysis was used to analyse the genomic region of interestfor insertions and deletions (indels). Robust gene editing (about 80%indel rate) was observed, indicating that the SpyCas9 RNP canefficiently generate DSBs at this site. Closer examination of thesequence variants revealed that on average about 57% of the allelescontained a precise 8-bp deletion corresponding to the wild-type allele.See, FIG. 10C and FIG. 20A.

Notably, when introduced into wild-type cells containing functionalTCAP, the SpyCas9 RNPs did not cause measurable editing at the TCAPallele, indicating that the corrected allele in the mutant cells is notsubject to unintended damage following MMEJ-mediated reversion. See,FIG. 10C. In addition to the precise 8-bp deletion, it was also observedthat an additional approximately 17% of the alleles contained in-framemutations, and therefore may encode hypomorphic alleles with somerestoration of function. See, FIGS. 20A and 20C. Genotyping of 22 clonesgenerated from a nuclease-treated LGMD2G iPSC population revealed that77% contained at least one wild-type allele, indicating that themajority of nuclease-treated cells would be phenotypically corrected.See, FIG. 10D. To independently verify the duplication collapse ratesobserved in edited iPSCs by Illumina short-read sequencing, a 2-kbamplicon was sequenced spanning the TCAP locus from a population ofSpyCas9-edited iPSCs using the Pacific Biosciences long-read sequencingplatform (PacBio). Analysis of these reads revealed that 67% of theedited alleles with insertions or deletions below 100 bp in lengthcorresponded to the 8-bp collapse, which is similar to the 73% rate of8-bp collapse determined by Illumina sequencing for this sample. SeeFIG. 21. Treatment of cells with Cas9 nuclease can produce largedeletions (>100 bp) at the target locus at a modest frequency¹⁰.Consistent with these findings, the PacBio analysis revealed thepresence of large deletions (100-1,000 bp) that would not have beendetected by Illumina sequencing at a frequency of about 2% in bulkedited iPSCs. A genotypically complex iPS cell colony was also isolatedthat harboured two large deletions at the TCAP locus. See, FIG. 22.

To demonstrate the translatability of the present approach to musclecell types, LGMD2G iPSCs were differentiated into proliferative skeletalmyoblasts that can be induced to terminally differentiate intomyotubes¹¹. iPSC-derived myoblasts can repair damaged muscle in asimilar way to myogenic satellite cells (one of the primary targets ofgene therapy for myopathies). Myoblasts were electroporated with SpyCas9RNPs programmed to target the 8-bp microduplication. Following editing,about 45% of the alleles were precisely repaired back to the wild-typesequence. See, FIG. 10E and FIGS. 20B & 20D. Immunostaining of myotubesderived from corrected LGMD2G iPSC clones with an anti-telethoninantibody showed that genetic correction restored telethonin expression.See, FIG. 23. Collectively, these data show that introducing a DSB closeto the centre of microduplication can efficiently achieve precise invitro correction of the 8-bp microduplication associated with LGMD2G iniPSCs and in myoblasts that mimic cell populations that would betherapeutically targeted in vivo.

The present approach was further tested on a 16-bp pathogenicmicroduplication in exon 15 of HPS1, which is associated with HPS1 andleads to the production of a truncated protein responsible for thisautosomal recessive disease¹². HPS1 has a high prevalence in the PuertoRican population, with a carrier rate of approximately 1 in 21 in thenorthwest region². HPS proteins are involved in the biogenesis oflysosome-related organelle complexes (BLOCs), which are necessary forthe proper trafficking of cargo to melanosomes, dense granules andlysosomes¹³. HPS1 patients suffer from albinism, bleeding disorders,vision loss and progressive pulmonary fibrosis, which leads to prematuredeath¹⁴.

Gene correction efficacy was determined in a patient-derived Blymphocyte cell line (B-LCL) homozygous for the 16-bp microduplicationby electroporating these cells with SpyCas9 RNPs programmed to cleavetwo base pairs away from the centre of the microduplication. See, FIG.11A (target site 1). To accurately assess the observed editing rates,unique molecular identifiers (UMIs) were added to the PCR ampliconsduring Illumina library construction to allow the removal of anyamplification bias¹⁵. It was confirmed that this approach accuratelycaptured the relative percentage of HPS1 microduplication and wild-typealleles present in a series of test populations. See, FIG. 24. At HPS1target site 1, editing was observed at about 46% of the alleles witharound 35% restored to the wild-type sequence. See, FIGS. 11B & 11C.

The effect of the position of the DSB within the microduplication wasfurther examined on the efficiency of MMEJ-mediated repair by designingfive additional sgRNAs that targeted the DSB to different positionsrelative to the centre of the microduplication See, FIG. 11A (targetsites 2-6). As the break site was shifted away from the centre, therewas a decrease in the efficiency of achieving the precise 16-bpdeletion. See, FIGS. 11B & 11C. However, target sites 3 and 6 werenotable exceptions to this trend. Target site 3 was observed to be quiteefficient at generating indels to the exclusion of the 16-bp deletion,probably because the wild-type sequence, once regenerated, can also betargeted by this sgRNA for further mutagenesis. See, FIG. 25. On theother hand, target site 6 achieved efficient deletion of the 16-bpmicroduplication (more than 50% of the modified alleles), despite beingthe most distal of the cleavage sites (10 bp from the centre of themicroduplication). Its efficiency may be due to the extended regions ofhomology that surround the cleavage site at this end of themicroduplication. See, FIG. 11A (target site 6). Overall, these resultsdemonstrate that the cleavage position within the microduplication andthe presence of alternate regions of microhomology can influence theproduction of the desired wild-type end product. See, FIG. 11C.

To investigate whether nuclease-mediated collapse of a microduplicationoccurs via the MMEJ pathway, a DNA repair factor (PARP-1) that regulatesDSB flux through this pathway was inhibited. PARP-1 influences therepair of a DSB through resection-dependent DNA repair pathways, such asMMEJ^(3,16), which are in competition with the non-homologous endjoining pathway (NHEJ) for DSB repair¹⁷. See FIG. 18A. Inhibition of thecatalytic activity of PARP-1 by rucaparib reduces DSB flux through theMMEJ pathway, resulting in a decrease in microhomology-based deletionproducts in the resulting repair events¹⁸. See, FIG. 18A.Patient-derived HPS1 B-LCL cells were treated with 10 μM or 20 μMrucaparib before and after treatment with SpyCas9 RNP to suppressMMEJ-mediated repair of DSBs. See, FIG. 12A. An overall reduction inediting rates was observed at the HPS1 locus upon rucaparib treatment.See, FIG. 12B. These lower editing rates were primarily the result of areduction in the 16-bp deletion product, which decreased from about 50%in untreated cells to around 15% and 6% in cells treated with 10 μM and20 μM rucaparib, respectively. See, FIGS. 12B-12D. A similar reductionin microhomology-based deletions was observed with SpyCas9 RNP targetingthe AAVS1 locus in patient-derived HPS1 B-LCL cells. See, FIG. 18. Thus,the MMEJ pathway underlies the robust correction of themicroduplications for LGMD2G and HPS1 in the presence of a targeted DSB.

B. Tay-Sachs Disease (HEXA Gene)

In one embodiment, the present invention contemplates a method for HEXAediting by Cas12a to correct a mutated sequence of the Tay-Sachs locus.

Two different Cas12a (also known as Cpf1) orthologs (LbCas12a andFnCas12a) were tested for their ability to drive microhomology-mediatedend joining (MMEJ) to collapse the common GATA microduplication in HEXAthat is associated with Tay-Sachs disease. The GATA⋅GATA duplication(red and blue segments) results in a frameshift within the gene thatinactivates it and leads to Tay-Sachs if both HEXA alleles aredisrupted). See, FIG. 14. This allele occurs with a frequency of ˜1 in100 individuals in some Jewish populations.

crRNAs were designed to target Cas12a cleavage to the region spanningthe microduplication to revert it to the wild-type sequence through MMEJrepair. One crRNA was designed for FnCas12a to utilize a TTC PAM(FnCas12a Guide). See, FIG. 14. Two crRNAs were designed for LbCas12a toutilize either a CTTC PAM (LbCas12a Guide 1) or a TTCC PAM (LbCas12aGuide 2). See, FIG. 14. For LbCas12a, these PAMs are not the optimalTTTV sequence, which may result in lower activity.

crRNAs (120 pmol) were complexed with 60 pmol of purified FnCas12a-2xNLSor LbCas12a-2xNLS protein and then electroporated into a B-EBV cell linethat is homozygous for the GATA microduplication in HEXA (CoriellGM11852). See, Table 3.

TABLE 3 crRNA Sequences for crRNAs targeting HEXA duplicationcrRNA/Cas12a Cas12a crRNA sequence LbCas12a Hexa guide 1 UAAU UUCUACUAAGU GUAGAU CAGUCAGGGCCAUAGGAUAGAUA LbCas12a Hexa guide 2 UAAU UUCUACUAAGU GUAGAU AGUCAGGGCCAUAGGAUAGAUAU FnCas12a Hexa guide UAAU UUCUACUGLTU GUAGAU CAGUCAGGGCCAUAGGAUAGAUA In the crRNA sequences the constantregion is in bold. Double underline indicates the base pairing regionsof the hairpin stem. Single underlined sequence is the guide sequence(23 nt)After 72 hours the genomic DNA from treated cells were harvested and thegenomic region of interest within HEXA was PCR amplified and submittedfor Sanger sequencing. Mutation rates were determined by TIDE analysis(tide.deskgen.com) in comparison to an unedited sequence chromatogramfrom the same genomic region. Total indels were modest (˜5 to 10%). See,FIG. 15 (blue bars). A reversion to the wild-type sequence was observedin all of the samples (brown bars), where for LbCas12a guide 2 themajority of the alleles that were edited restored the desired wild-typesequence. The experiment was performed in biological triplicate, wherethe error bars represent the standard error of the mean.

Representative sequence chromatograms show sequencing on thecomplementary strand. See, FIG. 16. The TATC duplication (complement ofGATA) is boxed in magenta and the respective guide target sequences foreach nuclease are underlined in green.

The concentration of the delivered Cas12a:crRNA was then increased foreach nuclease:guide combination from 90pmol to 180pmol. The editingrates after electroporation of the HEXA GATA duplication in the B-EBVline in a single experiment were improved (cyan bars) and the rate ofwild-type sequence reversion was also increased, reaching nearly 10% forthe LbCas12a guide 2 treated cells (orange bars). See, FIG. 17. Thesedata demonstrate the feasibility of reverting the mutant allele to thewild-type DNA sequence through the introduction of a targeteddouble-strand break without the need of a donor DNA sequence (orhomology directed repair) for this restoration.

Those in the art would appreciate that the current data showing thecorrection disease-causing alleles for two different diseases providesan expectation that the technology has widespread applicability.Furthermore, as more diverse programmable nuclease systems are defined(e.g., CRISPR systems) that have broader targeting range and betterdelivery properties, this type of approach will become easier to performin vivo. Although it is not necessary to understand the mechanism of aninvention, it is believed that this approach may also work efficientlyfor genetic diseases based upon repeat expansion mutations if DSBs canbe targeted just inside the edges of the repeat elements to allow theinduction of long-range microhomology mediated repair.

IV. Genomic Microduplication Variants

To investigate whether this MMEJ-based therapeutic strategy can beapplied more broadly for correcting human genetic disorders, abioinformatic analysis was performed to gauge the prevalence ofdisease-causing microduplications in human populations. The ClinVardatabase²⁰ includes about 4,700 duplications that are annotated as‘pathogenic’ or ‘pathogenic/likely pathogenic’. See, FIG. 26A.Duplications of lengths ranging from 2 to 40 bp were of particularinterest because the data presented herein indicate that microhomologieswithin this range can be precisely repaired via the MMEJ pathway. See,FIG. 13. ‘Simple’ duplications—those for which the duplicated sequenceis not part of a more complex repeat structure—were also evaluated as towhether they improve the odds that the primary homology-based collapsewould result in the desired wild-type sequence. Finally, allduplications in ‘coding’ regions (mainly exons plus 50 flanking bases)were examined from the gnomAD exome and genome sequencing databases²¹ toprioritize pathogenic duplications according to their frequencies inhuman populations. See, FIGS. 26A and 27. The present analysis yielded143 likely disease-causing microduplications of lengths 2-40 bp thatwere observed at least once in gnomAD, some of which occur in specificsubpopulations at substantial frequencies (for example, Tay-Sachsdisease), See, FIG. 26B.

To facilitate the utilization of a bioinformatics analysis, the presentinvention was accompanied by the creation of an interactive, searchablewebtool (rambutan.umassmed.edu/duplications/). This bioinformaticesanalysis also included the identification of potential Cas9 and Cas12acleavage sites within these microduplications²². As shown within thetool, ‘tiling’ data across HPS1 microduplications and endogenousmicroduplication sites, the position of the DSB break within theduplication, and the use of a guide design that avoids cleavage of thewild-type allele, facilitate an efficient, stable collapse ofmicroduplications. Rapid advances are being made in characterizingnucleases with alternate specificities^(23,24) and in engineeringnucleases with alternate or expanded recognition preferences²⁵⁻²⁷, whichwill make correction of disease-causing microduplications using theMMEJ-based approach even more effective.

The results below for the most part are based on the files of “coding”variants from gnomAD genomes and exomes, version 2.0.2.gnomad.broadinstitute.org/downloads. This database comprises variants inthe intervals used for the ExAC database. Most of these intervalscorrespond to exons plus 50 flanking bases on each side, and theycollectively cover 60 million bases, about 2% of the genome. Note thatthere are no variant calls for the Y chromosome, and these are notstrictly all coding variants, as some are in introns, UTRs, miRNA,ncRNA.

The 1000 Genome Project data was taken from ftp. 1000 genomes.ebi.ac.uk/vol1/ftp/release/20130502/. The vcf files there include precomputedallele-frequencies for five broad super-populations andallele-frequencies for 26 more-specific populations computed from theper-individual genotypes in the vcf files aggregated using thepopulation assignments from the fileintegrated_call_samples_v30.20130502.ALL.panel.

The ClinVar annotations were taken from the fileftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/clinvar_20180225. vcf.gz.Here, the variants have been normalized (trimmed and left-aligned) forthe purpose of matching them up, but the HGNC notation used at ClinVarmay follow the right-aligned (3′-most position) convention, in whichduplications are taken to occur immediately after the repeated sequencerather than immediately before the repeated sequence.

The gnomAD genome files contain a total of 4851138 distinct variantalleles, of which 145892 (˜3%) are insertions. The gnomAD exome filesabove contain a total of 17009588 distinct variant alleles, of which414576 (˜2.4%) are insertions. Note that many of these variants arecommon to both the exomes and genomes, but in the tables below variantsthat occur in both are counted only once.

Table 4 below focuses on the insertions, and in particular theduplications. The second column (insertions) gives the counts of all thedistinct insertion variant alleles, binned by the length of theinsertion (length), with all variants of length at least 40 combinedinto one bin. Subsequent columns give the number of variants thatsatisfy additional criteria, as follows:

-   -   dup: the insertion is an exact duplication of the immediately        adjacent sequence in the GRCh37 reference genome (immediately 3′        with this normalization). Note that there may be polymorphism in        this adjacent sequence that affect whether an insertion is        indeed a perfect duplication for any given individual.    -   dup2: the insertion does not add a repeat-unit to what is        already a (two-or-more unit) tandem repeat in the reference        genome. This eliminates e.g. the duplication of CCCGGG in RAX2,        as the reference genome already has two immediately adjacent        (3′) tandem copies of this:        https://www.ncbi.nlm.nih.gov/projects/SNP/snp_refcgi?rs.rs549932754    -   dup2i: the insertion satisfies the previous constraints and is        not itself a perfect tandem repeat (e.g., for a duplicated        six-mer, it is not of the form XXXXXX, XYXYXY or XYZXYZ). Note        that even if a duplicated sequence is not in itself a perfect        tandem repeat it may contain internal tandem repeats—e.g. the        AGGAGG in the duplicated AAGGAGGATC in NCF4—so depending on        where the Cas9 cleavage site is this may need to be considered,        to prevent a shorter internal microduplication from being        collapsed instead of the full duplication.    -   dup2iC: the variant satisfies the previous constraints and is        also listed in ClinVar.    -   dup2iL: the variant satisfies the previous constraints and is        reported in Clinvar as “Pathogenic”,        “Pathogenic/Likely_pathogenic”, “Likely_pathogenic” or        “Conflicting_interpretations_of pathogenicity”    -   dup2iP: the variant satisfies the previous constraints and is        reported in Clinvar as “Pathogenic” or        “Pathogenic/Likely_pathogenic”

TABLE 4 Microduplication Variant Characteristics In ClinVar And gnomADDatabases length insertions dup dup2 dup2i dup2iC dup2iL dup2iP 1 210230179654 59169 59169 399 242 182 2 51418 29880 11919 7562 53 25 19 3 3957923795 12892 11141 77 11 4 4 30704 18835 14615 13010 112 70 52 5 151426890 4754 4189 28 16 10 6 18971 11102 6125 5251 46 7 3 7 9634 3793 29762623 10 5 4 8 9123 3819 3038 2739 12 9 7 9 9818 5155 3979 3686 17 3 2 105756 1997 1683 1502 12 9 8 11 4326 1311 1236 1195 10 6 4 12 6249 33842957 2649 18 3 0 13 3207 1099 1068 1042 7 4 2 14 3068 1031 993 942 5 2 115 4307 2311 2190 2110 19 7 3 16 2813 1173 1128 1086 8 4 4 17 2438 10991069 1067 9 7 6 18 4316 2646 2552 2459 14 4 4 19 2065 1012 997 997 5 3 120 2148 1082 1045 1001 6 5 3 21 3463 2218 2141 2127 11 2 1 22 1687 818806 799 1 0 0 23 1395 690 670 670 3 3 3 24 2272 1283 1244 1221 7 2 1 251149 485 477 471 1 1 1 26 1006 356 353 350 3 0 0 27 1373 653 635 631 6 10 28 878 314 308 304 3 1 1 29 751 239 233 233 1 1 0 30 1321 579 549 5361 0 0 31 693 194 189 189 1 1 1 32 695 193 187 182 0 0 0 33 772 272 263262 3 0 0 34 590 169 164 157 0 0 0 35 528 121 117 116 0 0 0 36 743 244236 225 1 0 0 37 457 106 102 102 1 0 0 38 474 122 115 113 0 0 0 39 524149 140 140 2 0 0 40+ 12413 1818 1800 1756 7 1 1 Totals: 1-40+ 468496312091 147114 136004 919 455 328

Note that the filters used in columns dup2 and dup2i are included toincrease the chances of MMEJ restoring the duplication to exactly itswild-type form, via removal of exactly one complete copy of theduplicated sequence, but these filters may not be strictly necessarywhen suitable positions in the duplication can be specifically targetedfor cleavage.

A lot of duplications, do not appear annotated as “Pathogenic” inClinVar. Certainly there are many variants listed in ClinVar that arenot observed in either the gnomAD genomes or exomes, so are notaccounted for in the table above, and this includes 2189 duplicationsthat satisfy all the additional conditions for being in column dup2iPabove. But 2183 of these are in these “coding” intervals, so if thevariants had been observed at all in gnomAD they would have beenreported in these vcfs. It also wouldn't be surprising to miss variantsthat are extremely rare in general, or even not-terribly-rare variantsthat are concentrated in populations without many samples: with only˜100 subjects per population in the TGP data one would expect to missout on ˜13% of alleles with frequency 0.01 in these populations. And afew other possibilities:

-   -   Subjects known to have severe pediatric disease were not        included in the gnomAD dataset, so variants that cause these        diseases may be under-represented, in particular those with        dominant inheritance.    -   About 8% of the genome was masked during the gnomAD variant        calling (e.g. some repetitive sequence), so any ClinVar variants        the fall in these regions will not be reported in gnomAD. But it        appears that only one of the ˜13000 insertions from ClinVar        falls in one of these masked region—a benign variant in SHOX1        which is masked since it's in a PAR on the Y chromosome.    -   gnomAD doesn't report variants on the Y chromosome at all,        whether in masked regions or not. But the only “Pathogenic”        duplication on the Y chromosome in ClinVar is a single-base        insertion, Y:2655380:C/CT in the gene SRY:        ncbi.nlm.nih.gov/clinvar/variation/470195/    -   The longest insertion reported in the gnomAD coding regions has        length 621, and there are 1431 insertions of length at least        100, but it's possible that the detection sensitivity may        decline for longer insertions.        The above described variants are listed below; they are mainly        variants in UTRs or in intronic regions >50 bases from the        nearest exons (and hence not in the “coding” intervals list).        gnomAD and TGP vcfs are mainly variants in UTRs or in intronic        regions >50 bases from the nearest exons. These sequences are        expected to represent “Pathogenic” duplications with a length of        at least 100 from ClinVar that satisfy the conditions of the        column dup2iP above, none of which are observed in gnomAD. See,        Table 5. The First base in the reference allele, the subequent        bases are the inserted sequence

TABLE 5Duplications Not Contained In ″Coding″ Inverval List Or Having Length AtLeast 100 That Are In ClinVar But Not Present In Either ClinVar OrgnomAD DatabasesGTGAGCCACTGCGCCCAGCAGATTCAAGCTTTTTAAATGGAATTTTGAGCTGATTTAGTTGAGACTTACGTGCTTAGTTGATAAATTTTAATTTTATACTAAAATATTTTACATTAATTCAAGTTAATTTATTTCAGATTGAATTTAGTGGAAGCTTTTGTAGAAGATGCAGAATTGAGGCAGACTTTACAAGAAGATTTACTTCGTCGATTCCCAGATCTTAACCGACTTGCCAAGAAGTTTCAAAGACAAGCAGCAAACTTACAAGATTGTTACCGACTCTATCAGGGTATAAATCAACTACCTAATGTTATACAGGCTCTGGAAAAACATGAAGGTAACAAGTGATTTTGTTTTTTTGTTTTCCTTCAACTCATACAATATATACTTGGCAATGTGCTGTCCTCATAAAGTTGGTGGTGGTGACTCACTCTTAGGACACATTCAGATTTCTT AG GTGAGCTTATCAGGTTCTCCATTGGCAGGCAGGGCTCTAAGTGCAGTAACTTGATTTGCTGTTGTATTTGCTTAGGAAGAGCAGCACTTCAGAAAAGAGTGATGGCACTGCTGAGGCGCATTGAGCATCCCACTGCAGGAAACACTGAGGTATGCCCTTAGCAACAGAAACACCCCTCCCAGGCGCCCACCCTCAATTTGGAAGCCTCTTGTTACATATGTGTGATCAGGAATAGCTTTTGAAGTAAATCCAAGATACGTGCATATTACAAGTATAATATCTGAGTATTTAATATACATCAAGTTTGAAACTTGGCTGTAGCTGATTGATGTTTAGCTCTTGGGTACGAGTGTCTGCGTATATCTGTATGCTTATTTGGCTCTATGCCTGTGGGTGCACTTACTCTGTGTGTTTAGATCAGTCAGTTTCATCTCTCTAGGGGGTCTGTCTTCTGGGCATTGATGGCAAATCATTAATGTATTTGTTCTTTCTTTAGGTTTTATTGACTGATACCAATACTCAATTTGTAGAACAAACCATAGCTATAATGAAGAACTTGCTAGATAATCATACTGAAGGCAGCTCTGAACATCTAGGGCAAGCTAGCATTGAAACAATGATGTTAAATCTGGTCAGGTAAGCATTCTACTGAAATGTAGCAGAAACATTTTAAGAGATAAGAAAAACCTCTTACACACTGATACTGGTAGTAATTGATAAAATAACTGGCCATTCTTTACTGCACACAAACTA ACCGGTTCCGGCGGCCGGGGCTGGTGAGCCACTGCGCCCAGCAGATTCAAGCTTTTTAAATGGAATTTTGAGCTGATTTAGTTGAGACTTACGTGCTTAGTTGATAAATTTTAATTTTATACTAAAATATTTFACATTAATTCAAGTTAATTTATTTCAGATTGAATTTAGTGGAAGCTTTTGTAGAAGATGCAGAATTGAGGCAGACTTTACAAGAAGATTTACTTCGTCGATTCCCAGATCTTAACCGACTTGCCAAGAAGTTTCAAAGACAAGCAGCAAACTTACAAGATTGTTACCGACTCTATCAGGGTATAAATCAACTACCTAATGTTATACAGGCTCTGGAAAAACATGAAGGTAACAAGTGATTTTGTTTTTTTGTTTTCCTTCAACTCATACAATATATACTTGGCAATGTGCTGTCCTCATAAAGTTGGTGGTGGTGACTCACTCTTAGGACACATTCAGATTTCTTTTGGGAGCTAACGGCTTGGAGCTTCTTTCCAGGGATGGGGACCTGGAATTTGAGTACTGGTAGACTTTTCGTTGTTCAAACCATTCCTTCACAAATTCCTGAGGAAGGCCCACA GCTACCTTGGGCCTGGGCCGCAGAGCTGTGAGAATACCCCAGGGCCAGGAGCGCAGTCTCCACCAGCTGGCTAAAAAGCACATCTTTCCGCACCAGGACAAACTCGGCGTGTTCTTCTCTGTTGTCATATTCAAGAGAGCCGTCCAACTGCTCCACGACACAAAAGACAGGA ATCATCAAGTGCGCGGGCGGCGGCCGGAAGGGCCTCTTCATGCGGCGGCGGCGCCGGTAGTTGCCCTTCTCGAACATGTCTTCGCAGGCCGGGTCCAGCGTCCAGTAGTTGCCCTTGCGCT CGCCGCCGCCCTCCGGGCTGGAGAGGGGGATGTTGAGGAGGCTGGGGGTGGGGGCGGGGCATCGAGGGAGCTCCTGGTACTGGCGGCCCCGACTGTCCCCCCAGAAGCTGAAAATGTTGGACACTCCTGAGAAGGCGCCTGCAGCCAGAGAGCAGAGCTGGGTGAGCGGGGTAGACGCACCACCGCTGCCACGCCCGGTCCTCCCTCGCCCGCCCGTCGCCCGGGATACCTGACAGGGGGTTGCAAGTGTCGCTGCTCTTCTCGCAGTCCTCCATCAGGGGCTCCCCAGAGCTTATCAGGTTCTCCATTGGCAGGCAGGGCTCTAAGTGCAGTAACTTGATTTGCTGTTGTATTTGCTTAGGAAGAGCAGCACTTCAGAAAAGAGTGATGGCACTGCTGAGGCGCATTGAGCATCCCACTGCAGGAAACACTGAGGTATGCCCTTAGCAACAGAAACACCCCTCCCAGGCGCCCACCCTCAATTTGGAAGCCTCTTGTTACATATGTGTGATCAGGAATAGCTTFTGAAGTAAATCCAAGATACGTGCATATTACAAGTATAATATCTGAGTATTTAATATACATCAAGTTTGAAACTTGGCTGTAGCTGATTGATGTTTAGCTCTTGGGTACGAGTGTCTGCGTATATCTGTATGCTTATTTGGCTCTATGCCTGTGGGTGCACTTACTCTGTGTGTTTAGATCAGTCAGTTTCATCTCTCTAGGGGGTCTGTCTTCTGGGCATTGATGGCAAATCATTAATGTATTTGTTCTTTCTTTAGGTTTTATTGACTGATACCAATACTCAATTTGTAGAACAAACCATAGCTATAATGAAGAACTTGCTAGATAATCATACTGAAGGCAGCTCTGAACATCTAGGGCAAGCTAGCATTGAAACAATGATGTTAAATCTGGTCAGGTAAGCATTCTACTGAAATGTAGCAGAAACATTTTAAGAGATAAGAAAAACCTCTTACACACTGATACTGGTAGTAATTGATAAAATAACTGGCCATTCTTTAC TGCACACAAACTACCCAATTCAATGTAGACAGACGTCTTTTGAGGTTGTATCCGCTGCTTTGTCCTCAGAGTTCTCACAGTTCCAAGGTTAGAGAGTTGGACACTGAGACTGGTTTCCTGCTAAACAGTATGGTAAAGAACAGTCAAGCAATTGTTGGCCAGTTCTGTGCTTTTCCTCCTGAAGAGAAACTTGACACCATGGACAAAATAAATTGACCATCATCAGTCAGCTAACATGTATGATGCCTGGAAAAAATGCCCAGGAATTTACACACTAAAATGTCTGGGGCTGGGAGCGGTAGCTCATGCCTATAATCCCAGCACTTTGGGAGGCTGGAGCAGGACTGCTTGAGGCCAGGAGTTCAAGACCAGCATAAGCAACAGAGTGAGACCCAGTCTCTACAAAATAATAGTAGTAGTAATAATAAAATGTGTGGGATATGTGTGATTTGAATTTTTTTTTCTGTTGTCTTAAATTTTTCAAACCTGATTATGTATTATTTGTGTAATTTTTGAAGTATTAATATAGCATATTTTGAAGCTGATACTTGATATACATTCCAATCACATCTGATAACTTTTTTTTTTGTTTTGGGGGGTGTACAGAGTCCTGCTCTGTCACCCAGGCTGGAGTGCAGTGGCGCAATCTCAGCTCACTGCAACCTCCGCCTCCTAAGTTCAAGAGATTCTCCTGCCTCAGCCTCCTGAGTAGCTGGGTCTACAAGCGTGTGCAACTATGCCTGGCTAATTTGTGTGTGTGTGTGTATATATATATACATATATATGTGTGTGTGTGTGTATATATATATATAACATATATATAACATATATATATTATATATATATAACATATATATAACATATATATATGTTATATATATATAACATATATATAACATATATATATATATATATATAATATATATATATATATATATATATGTAATCCCAGCACTTTGGGATATATGTGTATATATGTTTTTTTTTTTTGAGACAGAATCTTGCTCTGTTGCCAGGCTAGAGTGCAGTGGCGTGATCTCGGCACACTGCAACCTCCACCTCCCTGGYTCAGGGGGGCCATTGTGGAAAAGAGCCTGCAGGGAGAGCAAACAGCGCGGTCATGGCCTCGGGAGCTGTGCGCGGCGCCTCGGGCAGCGTCTCCCGCCGCTTGTCGCC

Duplication variants were identified as annotated as “Pathogenic” or“Pathogenic/Likely_pathogenic” in the ClinVar database (Table 4, dup2iPcolumn), and observed in the gnomAD exome database. See, Table 6.

Table 6 has the following headings:

-   -   SEQ_DUP shows the duplicated sequence in upper-case (including        the extra copy on the variant allele), and flanking sequence in        lower case.    -   DUP_NUM labels the copies of the duplicated segment (1 for first        copy, 2 for second). these positions are also color-coded yellow        and cyan in the html table, but the colors are lost when        exporting the table as a CSV or Excel file.    -   Cas9_Wa shows cleavage sites for Cas9 and xCas9 enzymes on the        Watson strand, 3 bases left of PAM starts from PAM_FW_* columns.    -   Cas9_Cr shows cleavage sites for Cas9 enzymes and xCas9 on the        Crick strand, 4 bases right of PAM starts from PAM_RC_* columns.    -   Cpf1_Wa shows approximate cleavage sites for Cpf1 enzymes on the        Watson strand, 19 bases right of PAM ends (from the PAM starts        in PAM_FW_* columns and adjusted for motif widths).    -   Cpf1_Cr shows approximate cleavage site for Cpf1 enzymes on the        Crick strand, 18 bases left of PAM ends (from the PAM starts in        PAM_FW_* and adjusted for motif widths).

TABLE 6 Selected Genes Having Microduplication Variants SYMBOLMICRODUPLICATION INFORMATION PEX10ctgccgctgcctgaaaccgtacagcTTgcagccccatggacagcaccaggtg SEQ_DUP;.........................12......................... DUP_NUM;.A..A...aB.J..A....A...A..A......aaJ..A.....aa.A..A. Cas9_Wa;.A.a.a..Aa.a..Aa.....Aa...A..a...A..Aaaa.....A..A.aa Cas9_Cr;.....L......NL..................................N... Cpf1_Wa;................L................................... Cpf1_Cr; CLCNKBcgtggctggagggatcaccaatcccAAtcatgccaggggggtatgctctggc SEQ_DUP;.........................12......................... DUP_NUM;A..aa.aaac.................A...aaaaaA...A....aA..aaH Cas9_Wa;..A.Ca....A........Ca.aa..CaaaI.Ca..IAa...........AJ Cas9_Cr;....k.M..NLL..................................L..... Cpf1_Wa;....n............................................... Cpf1_Cr; ZMPSTE24tgtgggagatgccggccgagaagcgTTatcttcggggccgtgctgctctttt SEQ_DUP;.........................12......................... DUP_NUM;aa.aC.A..aA..a.aB.a.A........aaaA..a.A.A...........a Cas9_Wa;A.a.............Aa..Aa......A....Ca.Ba....Aa...A..a. Cas9_Cr;......................k.........................N.M. Cpf1_Wa;..................................L...............L. Cpf1_Cr; MMACHCtgtggcctaccatctgggccgtgttAAgagaggtgaggaaggctcagttttc SEQ_DUP;.........................12......................... DUP_NUM;A..........aaA..a.A....a.a.aa.a.aaB.aA....A......... Cas9_Wa;a.......J.aa..Aa.Ca....Aa......J...............A.a.. Cas9_Cr;...............................................N.... Cpf1_Wa;....N...........LN........................k......... Cpf1_Cr; ACADMtccagtccccctaattagaagagccTTgggaacttggtttaatgaacacaca SEQ_DUP;.........................12......................... DUP_NUM;A............aB.a.A....aaaB....aA.....Jab.J......... Cas9_Wa;.....Caah..aaaaa............Aa.......A.............A Cas9_Cr;......N.....k..N......M.L....L.......N..........N... Cpf1_Wa;.......LLN.........N..LN.................N........N. Cpf1_Cr; ACADMaaaactaatgagggatgccaaaatcTTatcaggtaaggttaaagatgatttt SEQ_DUP;.........................12......................... DUP_NUM;.....a.aaaC.A..............aA...aA.....aC.aC.......a Cas9_Wa;.........A............Aa....Ca..ICa................. Cas9_Cr;.k...........M..................................N.M. Cpf1_Wa;.............N.....k.....................N.........N Cpf1_Cr; ABCA4ttacgatgtcccagaggaagttggtCCacccagtaggtggtggggctcactc SEQ_DUP;.........................12......................... DUP_NUM;aC.A.....a.aaB.A..aA........A..aa.aa.aaaA........... Cas9_Wa;..A.a...A.....aaa.............aa.aaaH.............A. Cas9_Cr;.......................N........L..........N....L... Cpf1_Wa;....................................M.k....L........ Cpf1_Cr; AGLgaactgaccaatgagaatgcccagtAActgtcctttcagctgtgaaacacaa SEQ_DUP;.........................12......................... DUP_NUM;.a......aHaB..A....A.....A........A..a.aB.......A... Cas9_Wa;.Baa....A...Aa..........Aaa.....A...aa..Ba..A....... Cas9_Cr;...L...N..........NLL............................... Cpf1_Wa;....N.................Lk.....N.........N............ Cpf1_Cr; FLGtgataatgataagaactagaactgtGGaggactgccacgtgactgtattcct SEQ_DUP;.........................12......................... DUP_NUM;...aC...aB....aB...a.aa.aa...A....a.a...a......Ha.a. Cas9_Wa;a...................A.....A.........A..Aa.a....A.... Cas9_Cr;.k.....kL.............N............................. Cpf1_Wa;....L...............................M.......N....... Cpf1_Cr; GBAcccacgacactgcctgaagtagaagCCaatcctgtgaggctgccagccatga SEQ_DUP;.........................12......................... DUP_NUM;.a.....A...aB.A..aB.A........a.a.aA..A...A....aC.A.. Cas9_Wa;.....Aaa.a..A.a..Aa...........Aa..CaaHI.....A..Aa..A Cas9_Cr;....N............................................... Cpf1_Wa;.N....N......................................k...... Cpf1_Cr; ASPMggtgcatattttgaatatcctttcgTTactttaaagcctctgtaataagact SEQ_DUP;.......HaB..........A..........A.....A......a..JA... Cas9_Wa;..Aaa....A..J.........CaaH.Ba....A.......Aa.a....... Cas9_Cr;........N........................k.....M....kL..N... Cpf1_Wa;...........k..........N..N.......M...k...N.......... Cpf1_Cr; ADCK3gatgagccttttgattttggcactcAAgagcaccaccgagaagatccacaac SEQ_DUP;.A...I..aC..J.aA......Ja.A.......a.aB.aC.......I..aC Cas9_Wa;aa.a.......Aa............A.a.aH....A.aa.aa.......Caa Cas9_Cr;..............NL.................k.....k............ Cpf1_Wa;....N..............N........N....................... Cpf1_Cr; HAAOaagggcagttcccagcggcctcaccAAgggatgggctttcctgttctgtact SEQ_DUP;.........................12......................... DUP_NUM;.aA..........A..I..aC...Ha.a.aB.aB.aC....aB....aHa.a Cas9_Wa;a...A.......A.a..A...A....A......................... Cas9_Cr;..........kL..N.....................N...........N... Cpf1_Wa;......N.....LN..N..M.N..Lk..........LN....LN........ Cpf1_Cr; MSH6tgaaaaggctcgaaagactggacttAAttactcccaaagcaggctttgactc SEQ_DUP;.........................12......................... DUP_NUM;..aA...aB..a...aa.................A..aA....a..I..aC. Cas9_Wa;..Aa.........A.a......A....A.......A.aaa....A...A... Cas9_Cr;................k.............................N...N. Cpf1_Wa;....N.........k...........................N......... Cpf1_Cr; MSH6agtgtgcaggctcacaccaattgatAAgagtgtttactagacttggtgcctc SEQ_DUP;.........................12......................... DUP_NUM;.A..aAJ...........aC..Ha.a.A.......a....aa.A.....a.. Cas9_Wa;.A.........AJ.JA.a.a.aa.................JA....A..... Cas9_Cr;.....N..N.....L............................N........ Cpf1_Wa;..M.N................................N..N........... Cpf1_Cr; DYSFgcgaaaaaatgaggattcgtatcatAAgactggtgagttctgagtcttggag SEQ_DUP;.........................12......................... DUP_NUM;......aHaaC...A........a...aaHa.A...Ha.A....aa.A.... Cas9_Wa;a.....A..J...........BaH..Ca.J....A........Ba.....a. Cas9_Cr;..NL.......kLL...L....................NL...M........ Cpf1_Wa;..M.N......................L........................ Cpf1_Cr; DYSFctcccagaagacccagccatccccaTTgcccccaagacagttccaccagctg SEQ_DUP;.........................12......................... DUP_NUM;..aB.a.....A...........A.......a...A........A..aA..A Cas9_Wa;.Caaaa.aaa......AaaH.Aa.Caaaa....Aaaaa....A...Baa.aa Cas9_Cr;........NL......k.M.L....L.................L....N... Cpf1_Wa;............N...........................L........... Cpf1_Cr; DYSFtggaaatgaccttggagattgtagcAAgagagtgagcatgaggagcggcctg SEQ_DUP;.........................12......................... DUP_NUM;...a.....aa.aC..A..A...aHa.a.a.A...a.aa.a.aA...A..aA Cas9_Wa;A...A.........Aa.............A...........A........A. Cas9_Cr;........L.........................N......N.......... Cpf1_Wa;....N.......................................L....... Cpf1_Cr; ALMS1agtaccttcaggttccttctcacatAAgagagaagcccagtattttctatca SEQ_DUP;.........................12......................... DUP_NUM;......aA...............a.a.aB.A....A...............a Cas9_Wa;aa..A....Aa.ba....Baa.Ba.a.a............Aaa.......ba Cas9_Cr;...........L.................NL....NL..N............ Cpf1_Wa;..M.N......N..................N..............M...... Cpf1_Cr; ALMS1aacctctacttcttactcacaacatAAcagagaagccgagtattttctacca SEQ_DUP;.........................12......................... DUP_NUM;.........................a.aB.A.Ha.A...............A Cas9_Wa;.A.....Aa.a..A.Ba...A.aHa..A....A.......Aa........ba Cas9_Cr;................................N..N................ Cpf1_Wa;..M.N......N..................N..................... Cpf1_Cr; ERCC3aaaatttaaatccacaattttgtcaTTcttcgcggacgagggtcgcagtcaa SEQ_DUP;.........................12......................... DUP_NUM;.................A.........a.aa..aHaaA..A..A........ Cas9_Wa;.Ca..I.........CaaHa........a..BaHBa.a...A......a.a. Cas9_Cr;L........N..................k.....L.......k.....N..N Cpf1_Wa;............L................k...................... Cpf1_Cr; MMADHCacactctactctggcactttcaaagTTaagtttctgcactgttaatttcttg SEQ_DUP;.........................12......................... DUP_NUM;......J.aA..........A....A..J..A....A..........A.... Cas9_Wa;....a.a.a.aH.A.aH..A.a..Ba...........Ba..A.a........ Cas9_Cr;..N................N.....................kL.....N... Cpf1_Wa;k.....N...............N...........N................. Cpf1_Cr; NEBctgcatctcaggagtgacaggggttGGcggtggctttccccacattttcttt SEQ_DUP;.........................12......................... DUP_NUM;......aa.a.a...aaaA..aa.aa.aA................J..A... Cas9_Wa;.A.aHa..A.Ca.a........A.........A.....A..Baaaa.a.... Cas9_Cr;..k..N...N.....k..............................N..... Cpf1_Wa;...................................k.........LLN.... Cpf1_Cr; GALNT3taaaaatgtgagcgtttcagctgttGGcgactgttgctcctagcaaccgagc SEQ_DUP;...a.a.a.A.....A..A..aa.a...A..A......A.....a.A..a.a Cas9_Wa;.A.Ca............A...Ba..A......A..A.....A.aa...A..A Cas9_Cr;...M.M...N............................kL......N..... Cpf1_Wa;.......................N............................ Cpf1_Cr; TTNccatgacagacacggccttggtcccGGctggcatttttcactgttaaagtgt SEQ_DUP;.........................12......................... DUP_NUM;a.J.a....aA....aA....aA..aA...........A.....a.A..... Cas9_Wa;.....Aa....A...A.a..Aa.....aaa..A...A.....Ba.a...... Cas9_Cr;.......N...........k....................N....L...... Cpf1_Wa;........................k........................... Cpf1_Cr; TTNgatcagtgactgtatatcttgttttCCacacatgcctctgcattcaccttgt SEQ_DUP;.........................12......................... DUP_NUM;.a.a...A........A...J........A.....A..........a.A... Cas9_Wa;.......Ca.....A......Ca......Baa.a.a...Aa.a..A..BaHa Cas9_Cr;.k....N..NL.......N.................M.M..N....kLL... Cpf1_Wa; .........................................N......... Cpf1_Cr; OSBL1ccacgggccagctgacctttgactgTTgacgttggccacggtgcgcacccgg SEQ_DUP;.........................12......................... DUP_NUM;aaA...A..a......a...A..a..A..aA....aaJa.A.....aA.... Cas9_Wa;a.Baaaa.a...Aa..A...Aa.....A......A.....Aa.a....A.aJ Cas9_Cr;......L............NLL...................k......N... Cpf1_Wa;.................................................... Cpf1_Cr; CHRNGtcttccgttccgcctgctctatctcAAgtcacctacttccccttcgactggc SEQ_DUP;.........................12......................... DUP_NUM;..A....A...A...........A.................a...aA..aB. Cas9_Wa;..Aa.Ca.Baa..Baa.aa..A.a..Ca.a....a.aa..A.Baaaa.Ba.. Cas9_Cr;.M.......................NLL..NLL..........M........ Cpf1_Wa;....N............................N.................. Cpf1_Cr; GLB1ccccgtacccgggtcccgcagacttAAcgcgcaagccgcgcgtagggcccag SEQ_DUP;.........................12......................... DUP_NUM;A....HaaA....A..a.......a.A...A..a.a.A..aaA....aB.A. Cas9_Wa;..Aa.aaaa...Aaah...aaa.a...A....A.a.a...Aa.a.a...... Cas9_Cr;........L..............L.............L........N..... Cpf1_Wa;....N......N...................N........N......N.... Cpf1_Cr; COL7A1tcccacagctccagtaggtccagtcAAggccctggaggaagagaaagttcag SEQ_DUP;.........................12......................... DUP_NUM;...A.....A..aA....A....aA....aa.aaB.a.aB..A....aaA.. Cas9_Wa;......aaa.a..A..aa.......aa...a....Aaa.............. Cas9_Cr;................L........L........L........L........ Cpf1_Wa;....N.......L...LN....k.................N........... Cpf1_Cr; COL7A1acccacagcaaatagcttgacccccTTgccccttcagcctttgggcagctgt SEQ_DUP;.........................12......................... DUP_NUM;...A.....A...a.........A........A.....aaA..A..a.A... Cas9_Wa;.aa...AaaHa..A......A....Aaaaa...Aaaa.Ba..Aa......A. Cas9_Cr;.....................N.................N........N... Cpf1_Wa;..............................................L..... Cpf1_Cr; DNAH1ggaacaccgtcaccccgcggctgatGGcgtcacttcaactacctgtctttcg SEQ_DUP;.........................12......................... DUP_NUM;....A.......a.aA..aC.aa.A...............A......A..a. Cas9_Wa;...A.....A.aa..a.aaaa.a..A......A..a.a.Ba..A..Aa...a Cas9_Cr;.................................................... Cpf1_Wa;...............N......................L............N Cpf1_Cr; AGTR1gtcatgattcctactttatacagtaTTcatctttgtggtgggaatatttgga SEQ_DUP;.........................12......................... DUP_NUM;.aC.......J.......A...........a.aa.aaaB......aaB.... Cas9_Wa;.......a.....Baa..A......A.....BahCa................ Cas9_Cr;.............N.......k........NL......k.M.......NL.. Cpf1_Wa;...................LLN......LLK...............M..... Cpf1_Cr; PCYT1AggagggggagcgctcgcgagtagggCCtgctgctggggctctgcttcgggct SEQ_DUP;aaaa.a.A...aHa.A..aaA...A..A..aaaA....A....aaA..aC.a Cas9_Wa;...............A.a.a.a........Aa..A..A.....A.a..A.Ba Cas9_Cr;.................................................... Cpf1_Wa;.................................................... Cpf1_Cr; WDR19acattttgtggtcatttctactcatAActggagagcttggtcaagagatatt SEQ_DUP;.........................12......................... DUP_NUM;...a.aA..................aa.a.A...aA....a.aC.......a Cas9_Wa;......A..........a...Ba..A.aH...A.......A.....a..... Cas9_Cr;.......N......kL............k.........k............. Cpf1_Wa;..M.N...L............N....M..............N...M.k...M Cpf1_Cr; SCARB2ccctccatagaaagaagcaaaacttAAcacaaagtcatctaattttttgaca SEQ_DUP;.........................12......................... DUP_NUM;.....aB..aB.A......J.........A..............a....... Cas9_Wa;A...aaaa.aa...........A....A....A.a.....a.Ca........ Cas9_Cr;......................L....L..................N..... Cpf1_Wa;....N....k.........N..........M.k...........LLN..... Cpf1_Cr; FAM175AactactaccagtatctgctttagatCCgtttgtcttgtgtatctaacaaccg SEQ_DUP;.........................12......................... DUP_NUM;......A.....A.....aC...A...A....a.A............a.aC. Cas9_Wa;......A..A..Aa....Ca.JA......CaaH.....a.......Ca.J.A Cas9_Cr;...k........k.M..N...N.............M......k.....L... Cpf1_Wa;.......................N..N.............LN.......... Cpf1_Cr; MFSD8gcaataacccagcccaaaaaacttgTTatcagctgtcggatcaatctgcaga SEQ_DUP;.........................12......................... DUP_NUM;.......A............A......A..A.HaaC.......A..aB..J. Cas9_Wa;......A.....AaaH.Aaa......A......Ca..A...a...Ca..Ca. Cas9_Cr;....k...........M.M..........................N..N.M. Cpf1_Wa;................L....N........k.................N..N Cpf1_Cr; ETFDHtttgtgcagcatatcagtgctttcaTTgccttaaaaattaagaaaaattatc SEQ_DUP;.........................12......................... DUP_NUM;.A..A.......a.A........A.............B.............. Cas9_Wa;...........A..a...Ca....A..ba....Aa................. Cas9_Cr;..............kL........k..........M........kL..N... Cpf1_Wa;...........k....../n..k........................N.... Cpf1_Cr; AGAacaaactaagaagtcataccttggcAAggaagcgcatcaatatatcaccatt SEQ_DUP;.........................12......................... DUP_NUM;.....aB.a.........aA...aaB.B.a.A.................... Cas9_Wa;...BaHa...A........a...Aa....A.......A.a.Ca......Ca. Cas9_Cr;N...L...............NL.....................N........ Cpf1_Wa;....N..LN........N..M............................... Cpf1_Cr; DNAH5agttcttggtttgcacgggaatgggTTaaatttacagatgctatctccaagg SEQ_DUP;.........................12......................... DUP_NUM;...aAJ..A...aaaB.HaaA...........aC.A..........aA.... Cas9_Wa;........Ba........A.a..................A.....A..Ca.a Cas9_Cr;...kL........N..N.M......N..N....k..............N... Cpf1_Wa;......k....................N........k.....k.......N. Cpf1_Cr; SLC25A46agatatccccggcagccgcaacctgCCactggggcgagaagagcccgcccta SEQ_DUP;.........................12......................... DUP_NUM;......aA..A..A......A.....aaaa.a.aB.a.A...A......aa. Cas9_Wa;aaaaa.....Caaaa..A..Aa.a..Aa..Aa.a.....A........Aaa. Cas9_Cr;..................L........M.L...................... Cpf1_Wa;.................N.................................. Cpf1_Cr; SLC22A5ctgctgcaatatttgccccggcgctAAttccatggccactgccctcttcctg SEQ_DUP;.........................12......................... DUP_NUM;.S........A....aa.A..........aA.....A.........Haaa.a Cas9_Wa;..A..A..A..A........Aaaa..A.a....Baa....Aa.a..Aaa.a. Cas9_Cr;......M....N.......................k..............NL Cpf1_Wa;....N............................................... Cpf1_Cr; RAD50aattatcactttcttcagcccccttAAcaattttggttggacccaatggggc SEQ_DUP;.........................12......................... DUP_NUM;.............A................aA..aa.......aaaa.aaaB Cas9_Wa;...A......Ca.a..Ba.Ba..Aaaaa....A.............AaaH.. Cas9_Cr;.........................N.M.....k..NL........N..... Cpf1_Wa;....N..N.........L.....N........LLk........N........ Cpf1_Cr; RAD50agaaacaagagaaacagcacaagttAAgacacaggtaatacagtctgtgtcc SEQ_DUP;.........................12......................... DUP_NUM;....a.aB.J..A.....A..J.a.....aa.......A...a.A....J.. Cas9_Wa;Aa........A........A..A.a.........A.a........A...a.. Cas9_Cr;......L.......................................N..... Cpf1_Wa;....N..........N.........................M.......... Cpf1_Cr; RAD50gagactcatgagacaagatattgatAAcacagaaggtaggtctgttttgctt SEQ_DUP;.........................12......................... DUP_NUM;.....a.a....aC....ac.......aB.aA..aA...A....A.....aC Cas9_Wa;.........A.aH.....A.............A.a...........a..... Cas9_Cr;.....................M.....................N........ Cpf1_Wa;..M.N......N......................M................. Cpf1_Cr; GRXCR2tgcaaatctggcaaggctgtaggccAAttctcattgcaggcagggcacctca SEQ_DUP;.........................12......................... DUP_NUM;.....aA...aA..A..aA............A..aA.JaaA.......aaA. Cas9_Wa;.......A...Ca..IA....A......Aa...BaHa....A...A....A. Cas9_Cr;..................N...N...........................N. Cpf1_Wa;....N.................................M...........k. Cpf1_Cr; DSPagagaattgaagagaggtgcaggcgTTaagctggaggattctaccagggaga SEQ_DUP;.........................12......................... DUP_NUM;B...aB.a.a.aa.A..aa.A....A..aaHaaC........aaa.a..... Cas9_Wa;..Ca....................A..JA......A........BaH.Aa.. Cas9_Cr;.........L...................N..................N... Cpf1_Wa;......N....L..............L...................k..... Cpf1_Cr; GCM2cacaaattgtgttctcacctgacctAAcctgaaaaaagatcgcgttgccatc SEQ_DUP;.........................12......................... DUP_NUM;aaaa.aA.........a.........aB.....aC..a.A..A.......A. Cas9_Wa;CaahCa.a.........BaJa.aa...Aa...Aa..........Ca.a.... Cas9_Cr;.................M.L.........N....N................. Cpf1_Wa;....N....Lk....................N.......k............ Cpf1_Cr; AARS2ctgacatcccagggaagatgccatcAAgagatgcagacactgagtgtgcgga SEQ_DUP;.........................12......................... DUP_NUM;.......aaaB.aC.A.......a.aC.AJ.a....Ha.a.a.aHaaC.a.a Cas9_Wa;....Aa...A.Caaa..........Aa.Ca........A...A.a....... Cas9_Cr;....NLL.......................L..................... Cpf1_Wa;....N.......................L....................... Cpf1_Cr; MUTcatggatggttctggaggaaaaatgTTatgtatttcgaaccataaattcctt SEQ_DUP;.........................12......................... DUP_NUM;aC.aA....aa.aaB.....A....A......aB................A. Cas9_Wa;....Ba.........Ba......................Ba...Aa...... Cas9_Cr;L.....M..............kL.........N...............N.M. Cpf1_Wa;...............LN...M.k............M.........L...... Cpf1_Cr; PKHD1aacttcacacacctttaatgtgcagTTaagttgaggatgcttgtgttagtgt SEQ_DUP;.........................12......................... DUP_NUM;.J.............a.A..A....A..aHaaC.A...a.A...a.A....A Cas9_Wa;a..A...A.Ba.a.a.aa.........A..J.............A....... Cas9_Cr;..........N...............NL.........k..........N... Cpf1_Wa;......N...............................N...N......... Cpf1_Cr; PKHD1agtgaatgctgaccccattgatagaGGacggaaattctgtggagaccagctg SEQ_DUP;.........................12......................... DUP_NUM;B..A..a........aC..a.aa..aaB......a.aa.a....A..aA..a Cas9_Wa;H............A...Aaaa............A......BaH.......Aa Cas9_Cr;..................N.....................N........... Cpf1_Wa;........LLk........L................................ Cpf1_Cr; WISP3cttctgcttgctggcctggcacaggTTaagtcctctcccccgactctttccc SEQ_DUP;.........................12......................... DUP_NUM;.A...A..aA.J.aA....aA....A...........a.............. Cas9_Wa;aa.a.aHBa..A...A...Aa...A.a.........aa.a.aaaaa..A.aH Cas9_Cr;..............N...L.....N.....N.................N... Cpf1_Wa;......N............................L................ Cpf1_Cr; BRAT1ctggctccccaaagagccctggtagTTaactccccctgctgggaagcaaaaa SEQ_DUP;.........a.A....aA..A............A..aaaB.A........aB Cas9_Wa;....aa...A.aaaa......Aaa..........A.aaaaa..A.......A Cas9_Cr;.....N.......................L..................N... Cpf1_Wa;......N.............LLN...k.......N................. Cpf1_Cr; ISPDgtggtccgcgccgcgctgaccactcAAggcaaggacccggctccgccggcct SEQ_DUP;.........................12......................... DUP_NUM;...a.A..a.A..a.........aA...aa....aA....A..aA...aa.A Cas9_Wa;.Aa......Jaa.a.aa.a.a...Aa.a.aH...A.....AaaH.A.aa.aa Cas9_Cr;............................L....................... Cpf1_Wa;....N....N.......................................... Cpf1_Cr; FAM126AactttgtggctcctggataactttaTTagagagatgaaactaaagaactctt SEQ_DUP;.........................12......................... DUP_NUM;.a.aA....HaaC...........a.a.aC.aB.......aB......A... Cas9_Wa;JAa.a.a.......A.aa.......A..................A....... Cas9_Cr;................N.........k..................k..N... Cpf1_Wa;  Cpf1_Cr;SLC25A13 gtagaaccatcgctgtagcaattcgTTaagtcagcaaagttacaccaaactg SEQ_DUP;.........................12......................... DUP_NUM;B......A..A..A......A....A...A....a............aaaa. Cas9_Wa;Ca..I......Aa.Ca.a.....A...BaH......a..A.......A.aa. Cas9_Cr;..N.........L...............................NL..N... Cpf1_Wa;......N.......k..........k................Lk........ Cpf1_Cr; PEX2attcctgatttcagtggctgcagacTTgtgtacttctgtgccacacttagga SEQ_DUP;.........................12......................... DUP_NUM;..aC.....a.aA..A..a...Ja.A.......a.a.........aa....A Cas9_Wa;Ca.a...Baa.....Ba.....A..A...A.......AJBa....Aa.a.a. Cas9_Cr;......k...N.............NL......kL..............N... Cpf1_Wa;................................N...k...........k... Cpf1_Cr; NBNgaactttcacatcaatttctaactcTTggttttgtgtccttgaataactgtt SEQ_DUP;.........................12......................... DUP_NUM;.......................aA....a.A....HaB......A..J... Cas9_Wa;........A..Ba.a.Ca....Ba...A.aH...........aaJ....... Cas9_Cr;.n.......NL..........k......kL.........k........N... Cpf1_Wa;....................LN..N........N.................. Cpf1_Cr; NDUFAF6cctgtggccattgaactatggaaggTTaaaaaaaaaaaaataccacttttaa SEQ_DUP;.........................12......................... DUP_NUM;.aA.....aB.....aaB.aA...........J..................A Cas9_Wa;.Ca..Aa.....Aa......A..........................Aa.a. Cas9_Cr;...........L.....................N......M.......N... Cpf1_Wa;N.....k......................N........LN............ Cpf1_Cr; VPS13BctctagagcgccagagaagattgttAAcatttaaaatgttcatcactcagtt SEQ_DUP;.........................12......................... DUP_NUM;.a.a.A...a.aB.aC..A..............A...........A....A. Cas9_Wa;.Ba.aa.a.....A.aa...............A...........Ba.Ca.a. Cas9_Cr;.......N..A...L...k........................N..N..... Cpf1_Wa;....N......k...................N.................... Cpf1_Cr; SPAG1aaaagaaggaaactgcagtggctgcAAttcaagattgtaacaggtaaactgc SEQ_DUP;.........................12......................... DUP_NUM;aB.aaB....A..a.aA..A.....I..aC..A.....aA...J..A...A. Cas9_Wa;..Aa.............A..A.....A..A...BaH.........A...... Cas9_Cr;L................N................................NL Cpf1_Wa;....N....N.......N......k.......................N... Cpf1_Cr; SLC52A2cctgcaggttggagcctcccctcttAAcgtctctgtgcttgtggctctgggg SEQ_DUP;.........................12......................... DUP_NUM;..aA..aa.A..............A.....a.A...a.aA....aaaaB... Cas9_Wa;aa.Baaa..A.........Aa.aaaa.a....A..a.a....A..J...A.a Cas9_Cr;.................L..NLL........N........L.....N..... Cpf1_Wa;....N.......................LLN..................... Cpf1_Cr; CDKN2AacccacctggatcggcctccgaccgTTaactattcggtgcgttgggcagcgc SEQ_DUP;.........................12......................... DUP_NUM;...HaaC..aA.....a...A..........aa.a.A..aaA..a.A..... Cas9_Wa;aaHa..AaaHaa....Ca..Aa.aa..Aa.....A...BaH...A..J...A Cas9_Cr;......L..................................L......N... Cpf1_Wa;......n............................................. Cpf1_Cr; APTXaaaatctacaatcaccttttcccccAAcagtgtgcatatgcttaaggagttc SEQ_DUP;.........................12......................... DUP_NUM;.........................a.a.A.....A.....aa.A.....aa Cas9_Wa;aaH.A....Ca..a..Ca.aa...Baaaaa..A......AJ.J..A...... Cas9_Cr;.........k...............................kLL........ Cpf1_Wa;....N.........M.......N........N.................... Cpf1_Cr; DNAI1ccccataaacagcctcataagcaggTTaacgtacgcacaccttccttctgat SEQ_DUP;.........................12......................... DUP_NUM;.......A........A..aA..J..AJ.JA..............aC.a... Cas9_Wa;...A.aaaa.....A..Aa.a.....A.......A...A.aJa.aa.Baa.B Cas9_Cr;...NL.....N.....................L...............N... Cpf1_Wa;......N............................................. Cpf1_Cr; FANCCtagtctgtgctctctgctgcctcccAAtcacgggggccgtagtagaaggcca SEQ_DUP;.........................12......................... DUP_NUM;..a.A......A..A............aaaaA..A..A..aB.aA....a.A Cas9_Wa;aHaa.....a....A.aJa..A..Aa.aaa..Ca.aI....Aa......... Cas9_Cr;.............................................L...... Cpf1_Wa;....N...................N.....N..................... Cpf1_Cr; MUSKcaacattccactggtacatattcttAActctggttgccttcagcggaactga SEQ_DUP;.........................12......................... DUP_NUM;......J.aA.................aG..A......a.aaB...a.aB.. Cas9_Wa;A.a..a..A..Baa.a.....A..J.BaH...A.aH.....Aa.Ba..A... Cas9_Cr;...N........................NLL............N..N..... Cpf1_Wa;....N..................LLN......k.......k........... Cpf1_Cr; TSC1tgctgtgcgcgtctgctccctgctgTTatcagtctgtccagcacttccattg SEQ_DUP;.........................12......................... DUP_NUM;a.a.a.A...A......A..A......A...A.J..A..........aaaa. Cas9_Wa;A.aa...A....A.aJ.a..A.aaa..A.....Ca...a...aa..A.a.Ba Cas9_Cr;...L..N...N..NL.......N.................L.......N.M. Cpf1_Wa;................................L................... Cpf1_Cr; ADAMTS13ggcagcaggtgctctactgggagtcAAgagagcagccaggctgagatggagt SEQ_DUP;.........................12......................... DUP_NUM;A..aa.A.......aaa.A....a.a.A..A...aA..a.aC.aa.A....a Cas9_Wa;aa.....A..A.....A.aJ.A.......a.......A..Aa...A...... Cas9_Cr;.N..............NLL................................. Cpf1_Wa;....N.....................L....................LN... Cpf1_Cr; AGPAT2gtacatgatgaggccccacgggccccAAggaagagcagctcccgcttgcgat SEQ_DUP;.........................12......................... DUP_NUM;..aC.a.aA.....aaA......aaB.a.A..A.....A...aa.aCJ..A. Cas9_Wa;aa......A..J......Aaa.a...Aaaa.........A..A.aaa.a... Cas9_Cr;N................................................... Cpf1_Wa;....N..LN......................................LLN.. Cpf1_Cr; C10orf11attgcaggtcactggaaggactgagCCgcattcaggagcctggaggaactca SEQ_DUP;.........................12......................... DUP_NUM;..aA.....aaB..aa...a.A..A......aa.A...aa.aaB........ Cas9_Wa;....a....A....a.a........A....Aa.a..BaH....Aa....... Cas9_Cr;......kL....N.....k.....N........................... Cpf1_Wa;....................L...LN.........L...N..N......... Cpf1_Cr; PTENgaaagggacgaactggtgtaatgatAAtgtgcatatttattacatcggggca SEQ_DUP;.........................12......................... DUP_NUM;aaa..aB...aa.A....aC....a.A...............aaaA...... Cas9_Wa;..A..........A...A........J.........A..J.......A.Ca. Cas9_Cr;........NL.......................................... Cpf1_Wa;..M.N......M..................k.......k.......N..... Cpf1_Cr; PTENtgtactttgagttccctcagccgttAAcctgtgtgtggtgatatcaaagtag SEQ_DUP;.........................12......................... DUP_NUM;...Ha.A........A..A.......a.a.a.aa.aC.......A.Ha.A.. Cas9_Wa;..Ba.....A..J....Baaa.a..Aa.....Aa.......J.J....Ca.. Cas9_Cr;...................NL........k....NLL.........N..... Cpf1_Wa;....N..............M....k.................k.....N..N Cpf1_Cr; PNPLA2ggacagctccaccaacatccacgagCCtgcgggtcaccaacaccagcatcca SEQ_DUP;.........................12......................... DUP_NUM;.A................a.A...aHaaA...J........A......A... Cas9_Wa;aa.a....A..A.aa.aa..A.CaaHa...Aa..A....a.aa..A.aa..A Cas9_Cr;..................L............L.........L.......... Cpf1_Wa;.................N.................N........N....... Cpf1_Cr; KCNQ1cctcgagcgtcccaccggctggaaaTTgcttcgtttaccacttcgccgtgtg SEQ_DUP;.........................12......................... DUP_NUM;a.a.A.......aA..aaB....A....A...........A..a.aHa.A.. Cas9_Wa;..A.Baa.a...A..aaa.aa..A.........A.Ba.....Aa.a.Ba.aa Cas9_Cr;..L..................NL..........L..............N... Cpf1_Wa;Lk.................................................. Cpf1_Cr; HBBttgtccaggtgagccaggccatcacTTaaaggcaccgagcactttcttgcca SEQ_DUP;.........................12......................... DUP_NUM;...aa.a.A...aA..........J.aA...Ja.A.........A....a.A Cas9_Wa;.........aa.......Aa...Aa.Ca.a.......A.aa...A.a..Ba. Cas9_Cr;.................N.....N...L....................N... Cpf1_Wa;......k............................................. Cpf1_Cr; HBBcttcatccacgttcaccttgccccaCCagggcagtaacggcagacttctcct SEQ_DUP;.........................12......................... DUP_NUM;......A........A........aaA..A....aA..a...........aa Cas9_Wa;.aa..A.Ba.CaaHa..Ba.aa...Aaaa.aa....A.....A..A...A.B Cas9_Cr;........................NL...L....NL....N........... Cpf1_Wa;..............N..................................... Cpf1_Cr; HBBtgccccacagggcagtaacggcagaCCttctcctcaggagtcagatgcacca SEQ_DUP;.........................12......................... DUP_NUM;.....aaA..A....aA..a............aa.A...aC.A......aa. Cas9_Wa;a.aa...Aaaa.a....A.....A..A...Aa.Ba.aa.a......a..... Cas9_Cr;......NL...L....NL....N...........................N. Cpf1_Wa;.................................................... Cpf1_Cr; ANO5gagaactgtagcttctaaagctcatAAgcataggtgtttggcaagacattct SEQ_DUP;.........................12......................... DUP_NUM;...A..A........A.......A....aa.A...aA...a........... Cas9_Wa;..A.......A.....A.Ba.....A.a.....A..........J.A....A Cas9_Cr;..M.N...M............N..............LLN............. Cpf1_Cr; MYBPC3cctcgatcatgcgccgcgcttcatgAActcagctcctgaatcaggtcgaagt SEQ_DUP;.........................12......................... DUP_NUM;aC....a.A..a.A......aB.....A....HaB....aA..aB.A....A Cas9_Wa;a.a.aaa.a..Ca...A.aa.a.a.Ba.....A.aH.A.aa....Ca..I.a Cas9_Cr;..........................................NL........ Cpf1_Wa;...LN...........LN........LN........................ Cpf1_Cr; MYBPC3cgccatcgtaggcaggcggctcccaCCtgtactgtgcaggagtcctctccca SEQ_DUP;.........................12......................... DUP_NUM;...A..aA..aa.aA......J..A....a.A..aa.A...........A.. Cas9_Wa;A..Aaa.aa.Ca.....A...A..A.aaa.aa....A..J.A..J...aa.a Cas9_Cr;............................................L....... Cpf1_Wa;.................................................... Cpf1_Cr; MYBPC3aacacccactcatcgctgtcacctgTTgtcctctggggcatctggggctggc SEQ_DUP;.........................12......................... DUP_NUM;..........A..A......A..A......aaaA.....aaaA..aA...aA Cas9_Wa;...a...A.aaaHa.aHCa.a...a.aa......aa.a.....A.Ca..... Cas9_Cr;..............N..N..............................N... Cpf1_Wa;.................................................... Cpf1_Cr; MYBPC3atggggtcatcgggggctccaggggTTaggaccattgagagctgctgagctt SEQ_DUP;.........................12......................... DUP_NUM;aA.....aaaaA.....aaaA...aa......a.a.A..A..a.A...a... Cas9_Wa;..Aa........a.Ca.....A.aa...........Aa........A..A.. Cas9_Cr;.........................................L......N... Cpf1_Wa;...........................................k........ Cpf1_Cr; BSCL2gactctggcagctcaagctctaaggTTaacacgatacggctgtccatacatc SEQ_DUP;.........................12......................... DUP_NUM;..aA..A.....A......aAJ....J.aC...aA..AJ.......J..A.. Cas9_Wa;a......A.aH..A..A.a...A.a.........A.a....A..A...aa.. Cas9_Cr;..........N...NL................................N... Cpf1_Wa;N.....N.....M...........M................k.......... Cpf1_Cr; BBS1cctgcgaagatggccgctgcgtcctCCatcggattccgacgcctgcggagct SEQ_DUP;.........................12......................... DUP_NUM;.aB.aC.aA..A..a.A.......IHaaC....a..A...a.aa.A..a.a. Cas9_Wa;A..A.aa..A........Aa.a..A..aa.aa.Ca....Baa..A.aa..A. Cas9_Cr;.........N......................................L... Cpf1_Wa;.........L...............L............LN............ Cpf1_Cr; LRP5tgactgtatgcacaacaacgggcagTTgtgggcagctgtgccttgccatccc SEQ_DUP;.........................12......................... DUP_NUM;.AJ..A.........aaA..A..a.aaA..A..a.A....A........aa. Cas9_Wa;a.a.....A......AJa..A..A...A.........A..A....Aa.J.Aa Cas9_Cr;....L.........................M.................N... Cpf1_Wa;.................................................... Cpf1_Cr; TYRcggcgatggtaggggccgtcctcacTTgccctgctggcagggcttgtgagct SEQ_DUP;.........................12......................... DUP_NUM;aC.aA..aaaA..A.........A....A..aA..aaA...a.a.A...A.. Cas9_Wa;.....A..A...........Aa..aa.a.a...Aaa..A...A....A.... Cas9_Cr;.................N..............................N... Cpf1_Wa;..............................................N....k. Cpf1_Cr; TYRggcagggcttgtgagcttgctgtgtCCgtcacaagagaaagcagcttcctga SEQ_DUP;.........................12......................... DUP_NUM;aaA...a.a.A...A..a.A...A......a.aB..A..A......aB.aB. Cas9_Wa;a..A...A....A.......A...A.....aaJ.a.a.........A..A.B Cas9_Cr;...............................N.......N........L... Cpf1_Wa;...........N....k............LN..k.................. Cpf1_Cr; MRE11AtccacaaatttctggctaaagcgaaGGaacactgaaaggttcaaaacctcca SEQ_DUP;.........................12......................... DUP_NUM;.........aA.....a.aB.aaB.....aB..aA................. Cas9_Wa;AaaH.CaaHa.....Ba...A.....A.......A.a........Ba....A Cas9_Cr;......k....N............L.......k................... Cpf1_Wa;.LN..LN.....Lk.......k...........M..........Lk...... Cpf1_Cr; ATMacagaacaatcccagcctaaaacttAAcatacacagaatgtctgagggtttg SEQ_DUP;.........................12......................... DUP_NUM;B.........A..........J.J......HaB..A...aHaaA...a.aA. Cas9_Wa;a.....A....A..CaaaI.Aa.....A....A...A.a.......a..... Cas9_Cr;..k.......k.....k..N.M...........L............N..... Cpf1_Wa;....N..M.......N...................N............k... Cpf1_Cr; ATMaggagtggaagaaggcactgtgctcAAgtgttggtggacaagtgaatttgct SEQ_DUP;.........................12......................... DUP_NUM;a.aaB.ab.aA....a.A.....a.A..aa.aa....aHaB....AJ..... Cas9_Wa;A...................A.a....A.aJ.......J....A........ Cas9_Cr;.................................................... Cpf1_Wa;....N.........L...N...LN........M...........M....... Cpf1_Cr; DPAGT1tgtggtagagcaatcccaaagtggtGGaaaaaaaagggtatcatgaagtaga SEQ_DUP;.........................12......................... DUP_NUM;A..a.A..........a.aa.aaB......HaaA......aB.A..a.aaaB Cas9_Wa;Aa.............A..CaaaI......................Ca.J... Cas9_Cr;.M.L...........N......N..............L.............. Cpf1_Wa;....LLK................LN.......LLk................. Cpf1_Cr; PKP2aggagaggttatgaagaatgcacacAAcaattctccgtggcctgagaaaaca SEQ_DUP;.........................12......................... DUP_NUM;a.aA....aBHab.JA................a.aA...a.aB.....aa.. Cas9_Wa;A........................A.a.a..A...BaHaa....Aa..... Cas9_Cr;..............L................N.M.................. Cpf1_Wa;....N..N.................k.........N..N...........M. Cpf1_Cr; PKP2ggcccgcctgctttcttggtggtgcAAgggtgtgcccagcctggcttctctg SEQ_DUP;.........................12......................... DUP_NUM;.A...A.......aa.aa.A..Haaa.a.A....A...aA.......aa..A Cas9_Wa;....A..Aaa.aa..A..Ba.........A..J......AaaJ.Aa...A.B Cas9_Cr;.M.................................k..N............. Cpf1_Wa;....N.....................................N......... Cpf1_Cr; AAASggtcccagaccatggagtgagcctcTTcccccaagcctgtgggtaaggacag SEQ_DUP;.........................12......................... DUP_NUM;...a.....aa.a.a.A.............A...aHaaA...aa...aA... Cas9_Wa;.a......aaa...Aa..........Aa.a.Baaaaa...Aa.......... Cas9_Cr;............NLL...........L.....................NLL. Cpf1_Wa;...........N...........N............................ Cpf1_Cr; MVKttcatggagaacatgccgtggtacaTTggcaaggtacaaagccgttagagcc SEQ_DUP;.........................12......................... DUP_NUM;.aa.aB....A..a.aA......aA.J.aA......A..A...a.A.....I Cas9_Wa;a.CaaHBa........A...Aa......A..J..A......A..J.Aa.... Cas9_Cr;.......L...............NL.......................N... Cpf1_Wa;.........N......k.................N................. Cpf1_Cr; TCTN2tccggccctgcggtcagcgcgtcccTTggtcggagacaccgagggtgtgacc SEQ_DUP;.........................12......................... DUP_NUM;A....a.aA...a.a.A......aA..aa.a.....aHaaa.a.a...a.A. Cas9_Wa;H.....aa..Aaa..A...a..A.a..aaa.....a.....A.aa....... Cas9_Cr;L...L.......k.M.L.......L....................L..N... Cpf1_Wa;..........L......................................... Cpf1_Cr; ATP6V0A2agtcgggcacggcctacgagtgcctCCagcgccctgggcgagaaaggcctgg SEQ_DUP;.........................12......................... DUP_NUM;aaA...aA....Ha.a.A......a.A....aaa.a.aB..aA...aA.... Cas9_Wa;aa..A...a...A.a..Aa..A.....Aa.aa..A.aaa....A........ Cas9_Cr;................NL..............................L... Cpf1_Wa;.....................k.............................. Cpf1_Cr; GJB2cttcctcttcttctcatgtctccggTTaggccacgtgcatggccactaggag SEQ_DUP;.........................12......................... DUP_NUM;.............A.....aA...aA....a.A...aA......aa.a.A.. Cas9_Wa;.....A.Baa.a.Ba.Ba.a....a.aa.......Aa.a...A..J.Aa.a. Cas9_Cr;....k.M...L....N........NL....N..N..........L...N... Cpf1_Wa;........................................L.....LN.... Cpf1_Cr; GJB2ccggtaggccacgtgcatggccactAAggagcgctggcgtggacacgaagat SEQ_DUP;.........................12......................... DUP_NUM;..aA....a.A...aA.......aa.a.A..aa.aJaa....aB.aC...A. Cas9_Wa;...a.aa......Aa.a...A..J.Aa.a.......A.a...A.....A.a. Cas9_Cr;...NL....N..N..........L............................ Cpf1_Wa;....N..............L.....LN..................M.....L Cpf1_Cr; CENPJccctttttaatgcaaggaaaggctgTTatgggtttcagatttatctgactgt SEQ_DUP;.........................12......................... DUP_NUM;.......A...aaB..aA..A...HaaA..I..aC.......a...a.aA.. Cas9_Wa;.....aaa.........A.........A...........Ba.......Ca.. Cas9_Cr;.......................L.....k..................N.M. Cpf1_Wa;......................................N............. Cpf1_Cr; BRCA2cttattttaactcctacttccaaggAAtgttctgtcaaacctagtcatgatt SEQ_DUP;.........................12......................... DUP_NUM;...................aaB..A....A.........A.I..aC...... Cas9_Wa;.A.a.aH........A.aa..A.Baa.........Ba...a...Aa....a. Cas9_Cr;M.......................N....k..........NLL......... Cpf1_Wa;N..LN..........k........................k...N...M... Cpf1_Cr; BRCA2attgtaaaaatagtcatataaccccTTcagatgttattttccaagcaggatt SEQ_DUP;.........................12......................... DUP_NUM;........A................aC.A...........AIHaaC...... Cas9_Wa;.Ba................a......Aaaa.Ba...........Baa...A. Cas9_Cr;..............N...k.....N...............M.......NL.. Cpf1_Wa;.....................N...........N....k....M.N....N. Cpf1_Cr; BRCA2atcaccagttttagccatcaatgggCCaaagaccctaaagtacagagaggcc SEQ_DUP;.........................12......................... DUP_NUM;...A.....A........aaA.....a.....J..A....a.a.aA...A.. Cas9_Wa;J.....Ca.aa........Aa.Ca......Aa.....AaaH......A..J. Cas9_Cr;.................................k.................. Cpf1_Wa;......k........k..................k........LN....... Cpf1_Cr; BRCA2caaacgaaaattatggcaggttgttAAcgaggcattggatgattcagaggat SEQ_DUP;.........................12......................... DUP_NUM;.aB.......aA..aA..A.....a.aA...HaaC.aC....aHaaC..... Cas9_Wa;....Aa...A...........A..........A....A..........BaH. Cas9_Cr;...................k.............N.M.......N..N..... Cpf1_Wa;....N..........L.............M........M.N.........M. Cpf1_Cr; BRCA2tttctgaaatagaagatagtaccaaGGcaagtcttttccaaagtattgttta SEQ_DUP;.........................12......................... DUP_NUM;.aB....aB.ac..A......aA...A...........A....A.......A Cas9_Wa;.......ba.................Aa.J..A....a...Baa........ Cas9_Cr;.N......................k........................... Cpf1_Wa;..N....N..........k...........k.....N..LN........... Cpf1_Cr; BRCA2gtttctttagagccgattacctgtgTTaccctttcggtaagacatgtttaaa SEQ_DUP;.........................12......................... DUP_NUM;.....a.a..aC......a.A..........aA...a....A.......... Cas9_Wa;.Ca.....Ba.......Aa.....Aa.......aaaH.Ba.......A.... Cas9_Cr;........N...k.M..........k...k.........N........N... Cpf1_Wa;.................N..........k.........k......N...... Cpf1_Cr; BRCA2tgcccctttcgtctatttgtcagacGGaatgttacaatttactggcaataaa SEQ_DUP;.........................12......................... DUP_NUM;......A.......A...a..aaB..A............aA.......A... Cas9_Wa;...A...Aaaa..Ba..a.......a...A.........A......A...A. Cas9_Cr;........kLL..N........N.......kL.......k............ Cpf1_Wa;....LLN..................N..k.......L.M........N.... Cpf1_Cr; BRCA2agatgtcttctcctaattgtgagatAAtattatcaaagtcctttatcacttt SEQ_DUP;.........................12......................... DUP_NUM;A.............a.a.aC.............A..............A... Cas9_Wa;...Ca......a.Ba.aa...................Ca.....aa....Ca Cas9_Cr;.........kL.......k.M.........N........N............ Cpf1_Wa;..M.N........k.........................k....LN...... Cpf1_Cr; ATP7BcaggagcacccgcatgaccctggggAAcgccatgggtaatggtgccagtctt SEQ_DUP;.........................12......................... DUP_NUM;.A.....A...a.....aaaaB..A...HaaA....aa.A...A....A... Cas9_Wa;aaa..A.....A.aaaHa....AaaH......A.aa.............Aa. Cas9_Cr;................L................................... Cpf1_Wa;..LLN...........N.................k................. Cpf1_Cr; ATP7BgtccatgttggctgacctgtgtctcAAgagatttgtaggcctgaacgtagaa SEQ_DUP;.........................12......................... DUP_NUM;..A..aA..a....a.A.....Ia.aC...A..aA...aB..A..aB.A... Cas9_Wa;..A.a..aa.......A...Aa.....a.a..............Aa....A. Cas9_Cr;.N.......................L....N..................... Cpf1_Wa;....N................LN......N.....................L Cpf1_Cr; NRLcccagctgctgctgcagggtagccaGGccagtacagctcctccaggcctggc SEQ_DUP;.........................12......................... DUP_NUM;A..A..A..A.HaaA..A...aAJ..A....A........aA...aA.Haaa Cas9_Wa;a..Aaaaa..A..A..A..A.......Aa...Aa....A..a.aa.aa...A Cas9_Cr;...............L.................................... Cpf1_Wa;.................................................... Cpf1_Cr; RDH12caggagcccgagtctatattgcctgCCagagatgtactgaagggggagtctg SEQ_DUP;.........................12......................... DUP_NUM;.A..Ha.A........A...A...a.aC.A....aB.aaaaa.A...A..A. Cas9_Wa;...AaaH....Aaa....a.......Aa..Aa.........A..J....... Cas9_Cr;...........k....N.....................M..N.......... Cpf1_Wa;.................LN....L...............LK.........L. Cpf1_Cr; SPATA7atgccaaagaaaaaatagctcctttAAcctttagaagggcatgactcaacat SEQ_DUP;.........................12......................... DUP_NUM;....aB.......A...............aB.aaA...a.........aaaC Cas9_Wa;........Aa.............A.aa.....Aa..........A....A.a Cas9_Cr;.........N..........M.........................k..... Cpf1_Wa;....N........N............N.....L.........N......... Cpf1_Cr; OCA2cttctcggaggaggcagatgcagacAAgaccagacacctccctgcttagcag SEQ_DUP;.........................12......................... DUP_NUM;..aa.aa.aA..aC.A..a....a..J.a..........A....A..aA... Cas9_Wa;.a.aaa.Ba.a........A.....A...A....Aa...A.aa.aaa..A.. Cas9_Cr;..........L..........L..N........................... Cpf1_Wa;....N............................................... Cpf1_Cr; EIF2AK4ataagctcttgtgacctcctggttgTTaagtgttggccagatgtctatgtcc SEQ_DUP;.........................12......................... DUP_NUM;A.....a.a.......aA..A....a.A..aA...aC.A.....A....aaA Cas9_Wa;...A......A.a......Aa.aa................JAa......a.. Cas9_Cr;......k......L....N............N.............N..N... Cpf1_Wa;......N.................................N....N...... Cpf1_Cr; CAPN3ctgagttcccaccggatgagacctcTTctcttttatagccagaagttcccca SEQ_DUP;.........................12......................... DUP_NUM;A.......HaaC.a.a.................A...aB.A........... Cas9_Wa;...AaaH....Baaa.aa........Aa.a.Ba.a........Aa......B Cas9_Cr;........N...k.M.............NLL.................N... Cpf1_Wa;.............M.......N.....................LLN...... Cpf1_Cr; TUBGCP4aacagctctcagcctggatgctccaTTggactcctcttggaccagcatgaag SEQ_DUP;.........................12......................... DUP_NUM;A......A..HaaC.A.......aa.........aa....A...aBHaB... Cas9_Wa;......JA..A.a.a..Aa......A.aa......A.aa.a.....Aa..A. Cas9_Cr;...k................M........................L..N... Cpf1_Wa;......L...........L........LN..N..........k......... Cpf1_Cr; SPG11cataccttggcaagatcatacagacAAgaggacctgtcgacagtagttcttc SEQ_DUP;.........................12......................... DUP_NUM;....aA...ac.......a....a.aa....A..a...A..A.......... Cas9_Wa;...A.a...Aa....A....Ca...A...A.......Aa...a..A...... Cas9_Cr;.................N...........N...................... Cpf1_Wa;....N..............................L................ Cpf1_Cr; SPG11tgggtctccaaatcccagagggtaaTTggtatagcccatcctttccacttcc SEQ_DUP;.........................12......................... DUP_NUM;.............aHaaA.....aA....A.....................A Cas9_Wa;..Aaa.....a.aa...CaaaI.................aaa.CaaH.Baa. Cas9_Cr;..............................L.....L...........N... Cpf1_Wa;..N......M......................N...k...........LN.. Cpf1_Cr; DUOXA2ttcacctggcgtctgaaagagaattAAcgccgcggagtacgcgaacgcactg SEQ_DUP;.........................12......................... DUP_NUM;...aa.A...aB..aHaB......A..a.aa.A...a.ab..A....aa.aB Cas9_Wa;..A...Ba.aa...A..a..............A.aa.a......A.aJ..A. Cas9_Cr;......N................NL.....................N..... Cpf1_Wa;N...N.......L........LN.......L...N..........L...... Cpf1_Cr; FBN1aagcccaaagccttcaaagacacttAAccttggcaccttcttccactggagg SEQ_DUP;.........................12......................... DUP_NUM;.....A......J.a..........J.aA..............aa.aa.... Cas9_Wa;........Aaa....Aa.Ba.....A.a....Aa....A.aa.Ba.Baa.a. Cas9_Cr;...................N...............NL.........N..... Cpf1_Wa;....N.....................L......N..Lk..........L... Cpf1_Cr; KIF7gtgccgcgccgcgttgcccatctccAAggaggctcagcacctcatccaggcc SEQ_DUP;.........................12......................... DUP_NUM;.a.A..a.A..A...........aa.aA.J..A...........aA...... Cas9_Wa;aaa.....Aa.a.aa.a....Aaa.Ca.aa.......A.a..A.aa.a.Caa Cas9_Cr;...N..............L.................N.........L..... Cpf1_Wa;....N............................................... Cpf1_Cr; BLMaagcagtatttttttttccaactagTTggggacatgattttcgtcaagatta SEQ_DUP;.........................12......................... DUP_NUM;.A..................A..aaaa.I..aC.....A.I..aC....... Cas9_Wa;a.......A.......J....Baa..A..........A.......Ba..a.. Cas9_Cr;.........L............................kLL.......N... Cpf1_Wa;........L...............N.......k....N..LN.......... Cpf1_Cr; BLMtccttctgttccggtgatggctcttAAcggccacagctaatcccagggtaca SEQ_DUP;.........................12......................... DUP_NUM;...A....aa.aC.aA........aA.....A........haaA....aB.a Cas9_Wa;.....Baa.Ba...Baa........A.a....A..Aa.a..A...CaaaI.. Cas9_Cr;......M...NL..........kL..N....NLL............N..... Cpf1_Wa;....N............N..............N................... Cpf1_Cr; BLMtgtccattacttcaatatttttaatAAccgtcactctcaagaagcttgcagg SEQ_DUP;.........................12......................... DUP_NUM;.........................A..........aB.A...A..aahaaA Cas9_Wa;........aaJ...A.Ba..............aa..a.a.aHa......A.. Cas9_Cr;...................k......L..N...NL........k........ Cpf1_Wa;.N..n............N..N............................... Cpf1_Cr; ERCC4gtactacatgaagtggagccaagatAAcgtggttctttatgacgcagagcta SEQ_DUP;.........................12......................... DUP_NUM;.....aB.a.aa.A....aC....a.aA........a..A..a.A....... Cas9_Wa;A.......A..a...........Aa.......A.....Ba.......A.a.. Cas9_Cr;.........M.......................................... Cpf1_Wa;..M.N.........................N..................Lk. Cpf1_Cr; PALB2cagaaagggtcccactgctactaacTTagcctcctctttgtcaggccaagca SEQ_DUP;.........................12......................... DUP_NUM;.HaaA.......A...........A..........A...aA.J..A....A. Cas9_Wa;....Ca.........aaa.a..A..A...A....Aa.aa.a.....a...Aa Cas9_Cr;.........N.....N..N..............L..............N... Cpf1_Wa;.n........................N............M............ Cpf1_Cr; PALB2gtgctgactactaccgctatctgatAAgagtctgtaaaggaactgtagtcgc SEQ_DUP;.........................12......................... DUP_NUM;.a.........A......aC..Ha.A...A....aaB...A..A..A....a Cas9_Wa;BaJ.....A..JA..A..Aa.a..Ca..........a..........A.... Cas9_Cr;...k.............NL......................M.......... Cpf1_Wa;..M.N.........k...LN.................Lk............. Cpf1_Cr; CLN3ctcttaccagcggtattgctgagcgTTgactcagggaagtgttccagaaaaa SEQ_DUP;.........................12......................... DUP_NUM;.....a.aA....A..a.a.A..a.....aaaB.a.A.....aB....a.aa Cas9_Wa;a.a.aa.a...Aa..A.......a....A.....A.aH.........Baa.. Cas9_Cr;..L.......................N...........N.........N... Cpf1_Wa;.............LLN..........k.............k......N..k. Cpf1_Cr; CLN3atggacagcagggtctgctgaggggAAgaggccggcctgggtgaggcccagg SEQ_DUP;.........................12......................... DUP_NUM;...A.HaaA...A..a.aaaaB.a.aA..aA..Haaa.a.aA....aA..aa Cas9_Wa;aaH.A.....A..A.....a..A.............Aa..Aa.........A Cas9_Cr;.................................................... Cpf1_Wa;..LLN............................................... Cpf1_Cr; TK2ggcggcccagccccgcagcggccacAAgcagcatagccgggcgagcggatcc SEQ_DUP;.........................12......................... DUP_NUM;A....A....A..a.aA......A..A....A..aaa.a.aHaac....a.a Cas9_Wa;.aaa...A..Aaa..Aaaa.a..A..Aa.a...A..A....Aa...A...A. Cas9_Cr;.................................................... Cpf1_Wa;....N......M.............L.......................... Cpf1_Cr; APRTagcagttggctgcggggagacccttAAccaccagtggccagcagatcatcca SEQ_DUP;.........................12......................... DUP_NUM;A..aA..a.aaaa.a..............a.aA...A..aC........a.. Cas9_Wa;.Aa.a..A......A..A.......AaaH...Aa.aa.....Aa..A...Ca Cas9_Cr;.L...........N..............N.................N..... Cpf1_Wa;....N............................................... Cpf1_Cr; SPG7ttcagccaaggcctgcagcagatgaTTggaccatgtgagtcggctctggcca SEQ_DUP;.........................12......................... DUP_NUM;A....aA...A..A..aC.aC..aa.....aHa.A..aA....aa....... Cas9_Wa;.aaaa.Ba..Aa....Aa..A..A...........Aa........a..A.a. Cas9_Cr;.......................NL.......................N... Cpf1_Wa;......L............................................. Cpf1_Cr; CTNScttcagcctgcacgcggttgtcctcAAcgctgatcatcatcgtgcagtgctg SEQ_DUP;.........................12......................... DUP_NUM;.AJ..A...a.aA..A........A..aC........a.A..a.A..A...A Cas9_Wa;..a.Ba.Ba..Aa..A.a.a......aa.a..A.a...Ca.Ca.Ca...A.. Cas9_Cr;.....................N..NL..............N........... Cpf1_Wa;....N............................................... Cpf1_Cr; CTNSctcgcccccagcgcggtggccagcgCCgtgtcctggcctgccatcggcttcc SEQ_DUP;.........................12......................... DUP_NUM;......a.a.aa.aA...a.A..a.A....aA...A.....aA......aa. Cas9_Wa;a.aa.a.a.aaaaa..A.a.....Aa..A..aa....aaJ..Aa..Aa.Ca. Cas9_Cr;.....L.............................................. Cpf1_Wa;.................................................... Cpf1_Cr; FLCNcgtactggctgctgtatgggatgatGGcggacgcagcccacgggaagcatgg SEQ_DUP;.........................12......................... DUP_NUM;..aA..A..A...aaaC.aC.aa.aa..A..A.....aaaB.A...aA...a Cas9_Wa;a.aa.a...A..JA..A.......J.......A...A.a..Aaa.a...... Cas9_Cr;....N.................................M............. Cpf1_Wa;.......L.............LLN............................ Cpf1_Cr; FLCNggcctggcggacaatgctgaagagcTTgggggtggctggggtgctggtggct SEQ_DUP;.........................12......................... DUP_NUM;.aa.aa.....A..aB.a.A...aaaaa.aA..aaaa.A..aa.aA..a..A Cas9_Wa;.A.a...Aa...A...A....A.......A..........A.......A..J Cas9_Cr;..............L.................................N... Cpf1_Wa;........................................N..L........ Cpf1_Cr; RAD51DgcacagtccgaccctgagcacgcccAAtgttccccgcaggccggaacagccc SEQ_DUP;.........................12......................... DUP_NUM;.A...a....Ja.A...A......A......A..aA..aaB...A.....aa Cas9_Wa;Aa....A.a...aa..AaaH...A.a.aaa.....Baaaa.a...Aa....A Cas9_Cr;..............................L..................... Cpf1_Wa;....N................LLN............L............... Cpf1_Cr; BRCA1cactctcgggtcaccacaggtgcctCCacacatctgcccaattgctggagac SEQ_DUP;.........................12......................... DUP_NUM;..HaaA........aa.A..J..........A.......A..aa.a...aJa Cas9_Wa;.A.aaaHa.aHa....a.aa.a.....Aa.aa.a.a.Ca..Aaa.....A.. Cas9_Cr;...............L................................L... Cpf1_Wa;..................N......L.........N....N........... Cpf1_Cr; BRAC1agggaagctcttcatcctcactagaTTaagttctcttctgaggactctaatt SEQ_DUP;.........................12......................... DUP_NUM;B.A.............I..aC....A.........a.aa............. Cas9_Wa;J...A.......A.a.Ba.CaaHa.a..........Ba.a.Ba......A.a Cas9_Cr;...............N.................NL.............N... Cpf1_Wa;......N....................N.....................N.. Cpf1_Cr; BRAC1tagataagttctcttctgaggactcTTaatttcttggcccctcttcggtaac SEQ_DUP;.........................12......................... DUP_NUM;...A.........a.aa..............aA.........aA.......a Cas9_Wa;aHa.a.........Ba.a.Ba......A.aH.....Ba....Aaaa.a.Ba. Cas9_Cr;............NL.................N....N...........N... Cpf1_Wa;......N.....................N..........k............ Cpf1_Cr; BRAC1cttttgttttattctcatgaccactAAttagtaatattcatcacttgaccat SEQ_DUP;.........................12......................... DUP_NUM;.A............a...........A...............a........A Cas9_Wa;.Ca.aa...........BaHa....Aa.a.............BaHCa.a... Cas9_Cr;.k.....N..................k....k..N...............N. Cpf1_Wa;....N......N........................................ Cpf1_Cr; RAD51CgaatgtctcacaaataaaccaagatAAtgctggtacatctgagtcacacaag SEQ_DUP;.........................12......................... DUP_NUM;A.................aC....AJ.aA......Ha.a........aBJa. Cas9_Wa;...........a.a.a.......Aa.........A.....A.Ca.....a.a Cas9_Cr;............N.M..................................... Cpf1_Wa;..M.N.......................N..N.............LLN.... Cpf1_Cr; BRIp1tgaggtactgtactttaaagaggtcAActtcaagtgtagactcattgtcctg SEQ_DUP;.........................12......................... DUP_NUM;A.J..A.........a.aA..........a.A..a.......A....A.... Cas9_Wa;............A..J.A..J........a..A.Ba........JA.aH... Cas9_Cr;.N......NLL..........................k.............N Cpf1_Wa;....N.....N.....................M................... Cpf1_Cr; BRIP1actcactttaccacgacaaactgctAAccaggagagctccatcttaaacaac SEQ_DUP;.........................12......................... DUP_NUM;..........a.......A.......aa.a.A.................aB. Cas9_Wa;AaaHa.a.aHa....Aa.a..A...A..A...Aa.......A.aa.Ca.... Cas9_Cr;.............L................k..................... Cpf1_Wa;....N...................k...N....k.......M.....k.... Cpf1_Cr; BRIP1ttagcctccagctggatagtaaatgTTaacaccaagttctgacgaaaaggat SEQ_DUP;.........................12......................... DUP_NUM;......A.HaaC..A.....AJ.........A....a..aB.IHaaC..... Cas9_Wa;.........Aa.aa..A.................A.aa....Ba...A.... Cas9_Cr;...........k.......M...N......L.................N... Cpf1_Wa;......N.....N.........Lk.................M.N........ Cpf1_Cr; CANT1tggcgggcgcctggccgagcctccaGGttgtgtgcattgtgggtggggggcc SEQ_DUP;.........................12......................... DUP_NUM;aaa.A...aA..a.A......aA..a.a.A....aHaaa.aaaaaA...A.. Cas9_Wa;Aa.J....A...A.aa...Aa...Aa.aa..........AJ.J......... Cas9_Cr;.............N........N......................L....N. Cpf1_Wa;..................................................L. Cpf1_Cr; PYCR1cccacctgggggcctacctgctgctGGtgaagcccttggccagggcaaaagc SEQ_DUP;.........................12......................... DUP_NUM;...aaaaA.......A..A..aa.aB.A.....aA...aaA.....A...A. Cas9_Wa;a..aaaaa.aa......Aa..Aa..A..A........Aaa....Aa....A. Cas9_Cr;...L.................L.............................. Cpf1_Wa;.......LN................k.......................LN. Cpf1_Cr; NPC1aaaaggatgacaggaacatactgggAAgccacttctcctaggaccctgccca SEQ_DUP;.........................12......................... DUP_NUM;aaC.a...aab.......aaaB.A............aa.....A....A..A Cas9_Wa;a.a............A.....A...A.......Aa.a.Ba.aa.....AaaH Cas9_Cr;.................................................... Cpf1_Wa;..LLN............................................... Cpf1_Cr; LAMA3tcaaagtcaactgcaagcgagttatGGtggagtttagacccagccaggtaac SEQ_DUP;.........................12......................... DUP_NUM;.A......A...aHa.A....aa.aa.A....a.....A...aA....A... Cas9_Wa;aaa...a.....a..A..A...A....................AaaH.Aa.. Cas9_Cr;.......k..NL......L.........................N.M..... Cpf1_Wa;.......L....................N..........N............ Cpf1_Cr; SERPINB7ctgctgtaatggtgctggtgaatgcTTgtgtacttcaaaggcaagtggcaat SEQ_DUP;.........................12......................... DUP_NUM;.A....aa.A..aaHaB..A..Ja.A.........aA...a.aA......A. Cas9_Wa;A.a.Ca..A..........A..J......A.......AJBa.....A..... Cas9_Cr;................................................N... Cpf1_Wa;...............k.....N......N.............N.....Lk.. Cpf1_Cr; LDLRacggctcagacgagcaaggctgtcgTTaagtgtggccctgcctttgctattg SEQ_DUP;.........................12......................... DUP_NUM;....a..a.A...aA..A..A....a.a.aA....A.....A.....a.A.. Cas9_Wa;A..A..A..A.a...A...A....A...a...........aaa..Aa....A Cas9_Cr;................................................N... Cpf1_Wa;......N..........................................L.. Cpf1_Cr; LDLRagcccccaagacgtgctcccaggacGGagtttcgctgccacgatgggaagtg SEQ_DUP;.........................12......................... DUP_NUM;.....a..a.A......aa..aa.A....A..A....aC.aaaB.a.A.... Cas9_Wa;aa..A..Aaaaa....A...A.aaa....A......Ba.a..Aa.a...... Cas9_Cr;.............L...L......................L........... Cpf1_Wa;....L...................LLN......................... Cpf1_Cr; LDLRgagtgcctgtgccccgacggcttccAAgctggtggcccagcgaagatgcgaa SEQ_DUP;.........................12......................... DUP_NUM;A...a.A....a..aA.......A..aa.aA....a.aB.aC.a.aB.aa.a Cas9_Wa;.A..A.....Aa.J..Aaaa..A..A.Baa...A......Aaa..A...... Cas9_Cr;.........N..................................NLL..... Cpf1_Wa;....N...............LN......LN................L..... Cpf1_Cr; LDLRatctgtgcctccctgccccgcagatCCaacccccactcgcccaagtttacct SEQ_DUP;.........................12......................... DUP_NUM;a.A.......A....A..aC..............A.....A.......a.A. Cas9_Wa;Aa....Ca.J..Aa.aaa..Aaaa.a...CaaH.Aaaaa.a.aHaaa..... Cas9_Cr;.......................M.........L..............L... Cpf1_Wa;......N..............N...................L.......... Cpf1_Cr; LDLRaggctaaaggtcagctccacagccgTTaaggacacagcacacaaccacccga SEQ_DUP;.........................12......................... DUP_NUM;....aA...A.......A..A...Jaa..J.JA.............a....A Cas9_Wa;aa..a...A.......a..A.aa.a..Aa........A.a..A.a.a..Aa. Cas9_Cr;...............L.......................L........N... Cpf1_Wa;......N..............N.............................. Cpf1_Cr; ACP5agcgcagatagccgttggggaccttGGcgctggtgccgctttgaggggtcca SEQ_DUP;.........................12......................... DUP_NUM;..aC..A..A..aaaa.....aa.A..aa.A..A....a.aaaA.....aB. Cas9_Wa;.......A.a......Aa........Aa....A.a.....Aa.a........ Cas9_Cr;......NL....L........................N........N..... Cpf1_Wa;................................Lk.................. Cpf1_Cr; CYP4F22ctggggctggagaagacggcgttccGGcatatacgcggtgtccacccttctc SEQ_DUP;.........................12......................... DUP_NUM;aA..aa.aB.a..A....aAJ......a.aa.A................... Cas9_Wa;.aa.aa.....A.........A..A..Baa..A.....A.a.....aaJaaa Cas9_Cr;............................................NLL..... Cpf1_Wa;.......M.M.......................................... Cpf1_Cr; JAK3ggaggaaggggctcacctttgggtcTTggggatacagcaggaagtgagggtc SEQ_DUP;.........................12......................... DUP_NUM;  Cas9_Wa;aB.aaaA........HaaA....aaaaC....A..aaB.a.aHaaA.....A Cas9_Cr;aa..............A.a.aa.......a.........A..A......... Cpf1_Wa;................NLL......................k......N... Cpf1_Cr; ADCK4ctgggggagactcacctcgatgtaaTTggtctgtgaactctgtcccaaactc SEQ_DUP;.........................12......................... DUP_NUM;aaa.a.........aC.A.....aA...a.aB.....A............aA Cas9_Wa;.....a.........A.aHaa.a............a......A.aH..aaa. Cas9_Cr;......kL........................................N... Cpf1_Wa;..N..........LN..........k.................k........ Cpf1_Cr; ZNF575agtttggtccccggactgaccagcaTTagagcagccgcagccccagctcctt SEQ_DUP;.........................12......................... DUP_NUM;.aA.....aa...a....A.....a.A..A..A..A.....A......aC.. Cas9_Wa;aaH..........aaaa...A...Aa..A.......A..Aa.a..Aaaa..A Cas9_Cr;..................L.......k....L................N... Cpf1_Wa;.................................................... Cpf1_Cr; PNKPcctcggcttccagctctcggagcttAAcggggaatctctgggtacaagatcc SEQ_DUP;.........................12......................... DUP_NUM;aA......A.....aa.A......aaaaB.....haaA.....aC......a Cas9_Wa;....Aaa.a..A.Baa..A.a.a....A....A......Ca.aI.....A.. Cas9_Cr;................N.............NLL.............N..... Cpf1_Wa;....N....LLN............N..................LN....... Cpf1_Cr; NLRP12cccaggggataccccagggatacttAAcagctgcaccaacagcgtgtgcgct SEQ_DUP;.........................12......................... DUP_NUM;aaaaC.......aaaC.........a..A........a.a.a.a.A.....a Cas9_Wa;.A..Aaaa........Aaaa.......A....A..A..A.aa..A..A.... Cas9_Cr;..............................................N..... Cpf1_Wa;....N...........N...................L............... Cpf1_Cr; DNAAF3tcatgcgcaggtcccagtcgctgacAAccgcgccgggcgtcgtagcgggagc SEQ_DUP;.........................12......................... DUP_NUM;a.A..aA.....A..A..a......a.A..aaa.A..A..a.aaa.A....a Cas9_Wa;A..A.Ba...A.a....aaa...a.a...A..Aa.a.aa...A..a....A. Cas9_Cr;......................NL...........L................ Cpf1_Wa;....N.....................L......................... Cpf1_Cr; DNAAF3caggtgccgtccatccacagagcccAAgaagcagcacatctagctcggggtt SEQ_DUP;.........................12......................... DUP_NUM;.A..A..........a.A.....aB.a..A........A...aaaA..A.Ja Cas9_Wa;a.a..A.....Aa.Jaa.CaaHa....Aaa......A..A.a.Ca...A.a. Cas9_Cr;NL...............L...............L...L.............. Cpf1_Wa;....N..N............................................ Cpf1_Cr; KCNE1gaaagggcgtcaccgctgtggtgttAAgacaggatcatcctgggcattaagg SEQ_DUP;.........................12......................... DUP_NUM;aaa.A.....A..a.aa.A....a..HaaC.......aaA......aA.... Cas9_Wa;...a........A..a.aa.a..........J..A....Ca.CaaH...A.. Cas9_Cr;................N.............................N..... Cpf1_Wa;....N......................N....................k... Cpf1_Cr; HLCStcgggaatggactccctggtagcaaTTgaccaacagcagacagttgtccgtc SEQ_DUP;.........................12......................... DUP_NUM;aB..aa.......aA..A.....a.......A..a...A..A...A....Ha Cas9_Wa;.J.a.Ba.........A.aaa......A......Aa..A..A...A...... Cas9_Cr;........N.............NL............L...........N... Cpf1_Wa;..N.......N......................................... Cpf1_Cr; TMPRSS3aggtaaccacgtggccagaggcacaTTccctccctaaagcggagaaaaagta SEQ_DUP;.........................12......................... DUP_NUM;......a.aA...a.aA.................a.aa.aB....A..aA.. Cas9_Wa;aJ.A.......Aa.a....Aa.....A.a..Baaa.aaa.....A....... Cas9_Cr;.......N........................................NLL. Cpf1_Wa;..............k....L...k...............Lk........... Cpf1_Cr; PCNTctggcagccgcctccacctaggttcTTgcccgcagggctgccggctcggatg SEQ_DUP;.........................12......................... DUP_NUM;..A..A..........aA.....A...A..aaA..A..aA..HaaC.a.aa. Cas9_Wa;....Aa.J.A..Aa.aa.aa.aa.....Ba...Aaa.a....A..Aa..A.a Cas9_Cr;....................................L........N..N... Cpf1_Wa;..........................L.....L...........LLN..... Cpf1_Cr; PEX26agctcattggaggtgaagtgctcccTTgtgtgttgtggggatccaggccctg SEQ_DUP;.........................12......................... DUP_NUM;....aa.aa.aB.a.A.......a.a.A..a.aaaaC....aA....aA..a Cas9_Wa;A...A..A.a...............A.aaa........J.J.....CaaH.. Cas9_Cr;......N......k...............N...............L..N... Cpf1_Wa;.................L.................k...L..........N. Cpf1_Cr; SNAP29agaacaggaagcaaagtaccaggccAAgccacccaaaccttagaaagctgga SEQ_DUP;.........................12......................... DUP_NUM;..aaB.A.J..A.....aA....A..............aB..A.Haac.aC. Cas9_Wa;.J.......A......A......Aa.J.Aa...Aa.aaaH..Aa........ Cas9_Cr;.........M.......................................... Cpf1_Wa;....N........k........k.....L....M.......N...L.M.... Cpf1_Cr; CHEK2accctttcatattcatacctttctcTTgagacttctgcccagacttcaggaa SEQ_DUP;.........................12......................... DUP_NUM;.......J...............a.a......A....a......aaB..aB. Cas9_Wa;BaH...AaaH.Ba....BaH..Aa..Ba.a......A.Ba..Aaa...A.Ba Cas9_Cr;..k..............N..N.......kL....NL.......k....N... Cpf1_Wa;............................LN..LN.................. Cpf1_Cr; CHEK2gaacaggcactgctgccatgagactGGctgagcctcaacatccgactcccga SEQ_DUP;.........................12......................... DUP_NUM;.aA....A..A....a.a...aA..a.A...........a......a.a... Cas9_Wa;.A......A...A.a..A..Aa......A...A....Aa.a..A.CaaH.A. Cas9_Cr;.................................................... Cpf1_Wa;...............N.............................k.....k Cpf1_Cr; CYBBtgagtaaacaaagcatctccaactcTTgagtctggccctcggggagtgcatt SEQ_DUP;.........................12......................... DUP_NUM;........A.............Ha.A...aA.....aaaa.a.A........ Cas9_Wa;..AaaH.......A....A.Ca.aa..A.aH.....a...Aaa.a....... Cas9_Cr;..........N..............................L......N... Cpf1_Wa;.....................L.................N..N..Lk..... Cpf1_Cr; CHMtcatgagttatcaattatttttcttAAcctatctccatttcagtatatggaa SEQ_DUP;.........................12......................... DUP_NUM;a.A...................................A.....aaB..... Cas9_Wa;.Baa.Ca........Ca.........Ba....Aa..Ca.aa...Ba...... Cas9_Cr;M......N....N.....kLL.........N.M....N.....k..N..... Cpf1_Wa;....N..................M...LLN..k...N..........N.... Cpf1_Cr; CHMgaacagcaaaagttcctggttcctcTTgctggcactgtcaaaatggaaatct SEQ_DUP;.........................12......................... DUP_NUM;.A.....A.....aA........AJ.aA....A.......aaB......... Cas9_Wa;..AaaH..A..A......Baa....Baa.a..A...A.a...a......... Cas9_Cr;...................................NL.....NL....N... Cpf1_Wa;..................k....LLK.....k......k.....k....... Cpf1_Cr; ABCA4tctcccacaaaggcaatggcaaccgACACagctttctctgcatgccacctggag SEQ_DUP;.........................1122......................... DUP_NUM;.......aA....aA...J.a.....A........A...A......aa.aA... Cas9_Wa;...A.Ca.aaa.a.....A.....A..Aa..A.a..A..Ba.a..A...Aa.aa Cas9_Cr;...L.............N........L........................... Cpf1_Wa;.............................L.......N..N......L...... Cpf1_Cr; FLAD1cactcatgatgatgtgacctttgagGCGCagtggcacaggcctttggagatgag SEQ_DUP;.........................1122......................... DUP_NUM;...aC.aC.a.a......a.aa.A..a.aA....aA.....aa.aC.a.A..aB Cas9_Wa;..Aaaa.a.aH...........Aa.......A.a.....A.a...Aa....... Cas9_Cr;...........................................k.......... Cpf1_Wa;........................L..........LN...........N....L Cpf1_Cr; MSH6atttggccgttattcagattccctgGTGTgcagaagggctataaagtagcacga SEQ_DUP;.........................1122......................... DUP_NUM;aA..A....I..aC......aa.a.A..aB.aaA.......a..A..Ha.a.aa Cas9_Wa;....A......Aa.....BaH...Baaa.......AJ.J....A.......... Cas9_Cr;..N.......kL.......N.....k......N..NL....NLL.......... Cpf1_Wa;............N......M.k.............LLN................ Cpf1_Cr; SLC19A3ataaggaatggttctgagggtctcaTCTCatggagaaaaaaccaaataagcaga SEQ_DUP;.........................1122......................... DUP_NUM;aaB..aA....aHaaA...........aa.aB.............A..aHaaC. Cas9_Wa;.................Ba.......a.a.Ca.a............Aa...... Cas9_Cr;.M......L.........M...............N................... Cpf1_Wa;..........L...k.......k...N.....................k..... Cpf1_Cr; AGXTtccaacctgcctcctcgcatcatggCACAgccggggggctgcagatgatcgggt SEQ_DUP;.........................1122......................... DUP_NUM;....A.......A....J.aA....A..aaaaaA..A..aC.aC.HaaA..... Cas9_Wa;..aa.Baa..Aa..Aa.aa.a.a.Ca....A.a..Aa......A..A......C Cas9_Cr;......................NLL............................. Cpf1_Wa;.........................................N...M........ Cpf1_Cr; DNAH5cattttgttccatcagctgtcgcacTATAtctcattcgtaacctacaaaagaca SEQ_DUP;.........................1122......................... DUP_NUM;..A........A..a..A...............A............a....... Cas9_Wa;...CaaHI.....Baa.Ca..A...a.a.a....Ca.a..BaH...Aa..A... Cas9_Cr;....N........N.M......L....k..NLL................M.M.. Cpf1_Wa;.....M............N......k......N........N...k........ Cpf1_Cr; DNAH8acaatattagaacaaattttttgatAGAGacaccattgcaaaacaacataaagt SEQ_DUP;.........................1122......................... DUP_NUM;.....aB...........aC..a.a........A..............A...A. Cas9_Wa;......A..........A.................A.aa....A....A..A.. Cas9_Cr;.......k...NLL......N........N.............k.......... Cpf1_Wa;..M...............k....N..M.k....N.....N....M......... Cpf1_Cr; WISP3atttgtcttttctggatgctcaagtACACtcagagttacaaacccactttttgt SEQ_DUP;.........................1122......................... DUP_NUM;A.......HaaC.A..J.JA.......Ha.A.................a.aaB. Cas9_Wa;...........a...Ba......A.a.....A.aJaH......A...AaaHa.. Cas9_Cr;kLL.......L..............k......k..................... Cpf1_Wa;N.................k..............LLN........L.M.N..... Cpf1_Cr; AHI1agctggatggaatttagccgtgtaaACACaaaagaaggatgaggtaaaactctg SEQ_DUP;.........................1122......................... DUP_NUM;aaC.aaB.....A..a.AJ..........aBHaaC.a.aA.........aB..A Cas9_Wa;a...A..A..............Aa......JA.a.................... Cas9_Cr;...................N................k................. Cpf1_Wa;..k.....k....N..........k.......lK.................... Cpf1_Cr; CFTRtattatgtgttttacatttacgtggGAGAgtagccgacactttgcttgctatgg SEQ_DUP;.........................1122......................... DUP_NUM;..a.A............a.aaaHa.A..AJ.a.......A...A...IaaaC.. Cas9_Wa;..Ba......J......J.A.....A............Aa..A.a....A...A Cas9_Cr;...N.M.............NL....N.M......k.....k............. Cpf1_Wa;...L............................L..................... Cpf1_Cr; TMEM70gtccccgggcctctgtctcccgggcGTGTcctccagcagcgggccttcggggcc SEQ_DUP;.........................1122......................... DUP_NUM;..aaA.....A......aaa.a.A.......A..a.aaA.....aaaA..aA.. Cas9_Wa;aa.....aaaa...Aa.a...a.aaa...A....aaJaa..A..A...Aa.Ba. Cas9_Cr;..................L......L...............L............ Cpf1_Wa;............................................L......... Cpf1_Cr; BRCA2actctgaagaacttttctcagacaaTGTGagaataattttgtcttccaagtagc SEQ_DUP;.........................1122......................... DUP_NUM;.aB.aB..........a.....a.aHaB........A........A..A..... Cas9_Wa;CaaHI.A.aH......A...Ba.a...A...................a.Baa.. Cas9_Cr;kL.................L.................k................ Cpf1_Wa;..N.......N..N............N.......N..Lk...LN..N....... Cpf1_Cr; POMT2gttgttgtagtccttgtgcaaatagGTGTggtgacctgggtggggggtgggggc SEQ_DUP;.........................1122......................... DUP_NUM;..A..A.....a.A......aa.a.aa.a...Haaa.aaaaaa.aaaaa.aa.a Cas9_Wa;aHa.............aa.....A..J.........J..Aa............. Cas9_Cr;N.M...k..N.......L......N..N........N................. Cpf1_Wa;.................................L...LLN....LN........ Cpf1_Cr; PALB2atagagtctgtaaaggaactgtagtCGCGccctggtgaaattaggtcttcttag SEQ_DUP;.........................1122......................... DUP_NUM;.A...A....aaB...A..A..a.A....aa.aB.....aA........aaB.. Cas9_Wa;.Ca.........a..........A......a.a.aaa..............a.B Cas9_Cr;..................M................................... Cpf1_Wa;...............Lk................LN.......N........... Cpf1_Cr; RAD51DgtatagagaccagcatcaagcagttTATAtcaagactgatggcagaagagaaga SEQ_DUP;.........................1122......................... DUP_NUM;.a.a....A......A..A..........a...aC.aA..aB.a.aB.aB...J Cas9_Wa;a.aa.......J..Aa..A.Ca...A........Ca....A......A...... Cas9_Cr;.......kL.......N...L....M.....................k.M.M.. Cpf1_Wa;.....M....N.............N....N..k.....N....N.......... Cpf1_Cr; BRCA1gcacacacacacacgctttttacctGAGAgtggttaaaatgtcactctgagagg SEQ_DUP;.........................1122......................... DUP_NUM;.J.J......A..........aHa.a.aA.......A.......a.aHaaC..A Cas9_Wa;a.....A.aJa.a.a.a.a.a......Aa..................a.a.aH. Cas9_Cr;..........................................k........... Cpf1_Wa;..............k..................M.................... Cpf1_Cr; BRCA1ccttgattttcttccttttgttcacATATtcaaaagtgacttttggactttgtt SEQ_DUP;.........................1122......................... DUP_NUM;aC.............A...............a.a......aa.....A...... Cas9_Wa;BaH.Baa.......Ba.Baa......Ba.a....BaH.......A.......A. Cas9_Cr;..N.....N........N...kL..N.....k..NL....k..NL......NL. Cpf1_Wa;....M.....k.............L.............N............... Cpf1_Cr; BRCA1agcgcatgaatatgcctggtagaagACACttcctcctcagcctattctttttag SEQ_DUP;.........................1122......................... DUP_NUM;..HaB....A...aA..ab.a..............A.............aa.A. Cas9_Wa;..Ba...A.a.........Aa..........A.a.Baa.aa.a..Aa...BaH. Cas9_Cr;...k...............NL.............M.................NL Cpf1_Wa;.N.......................................LN.....L.M... Cpf1_Cr; KPTNggcctctttcggggcctctcctaccTATActggtcaggttcgtcagctctggga SEQ_DUP;.........................1122......................... DUP_NUM;......aaaA.........J.......aA...aA...A...A....aaaB.a.a Cas9_Wa;aa.....Aa.a..Ba....Aa.a.aa..Aa....A....a....Ba..a..A.a Cas9_Cr;................L.............kL.................M.... Cpf1_Wa;.....M........................LLN..................... Cpf1_Cr; SPTA1actgctggaacttcagggcccgcagCAACAActggtcacccttctccagggtcagc SEQ_DUP;.........................111222......................... DUP_NUM;..aaB......aaA...A..A........aA............HaaA...A..... Cas9_Wa;..A..JA..A.....A.Ba....Aaa.a..A..A..A....a.aaaHBa.aa.... Cas9_Cr;..................................NL.................... Cpf1_Wa;.....N..N.............................N................. Cpf1_Cr; TREX1ctccagactcgcacacggctgagggTGATGAtgtcctggccctgctcagcatctgt SEQ_DUP;.........................111222......................... DUP_NUM;.a.J.JA.....aA..aHaaa.aC.aC.A....aA....A....A.....A...a. Cas9_Wa;..aaaa.aa...A.aHa.a.a..A...............aa...Aaa..A.a..A. Cas9_Cr;...........M........L....L.............................. Cpf1_Wa;.......................................L................ Cpf1_Cr;SLC26A3acgatatacacatctaccttgatccTGATGAtaaattcttgcaaaatctgttaaaa SEQ_DUP;.........................111222......................... DUP_NUM;J...............aC....aC.aC.........A........A.......aB. Cas9_Wa;.Baa..A......A.a.Ca..Aa....CaaH..........BaH..A....Ca..I Cas9_Cr;........N.........NLL.......M............N.............. Cpf1_Wa;.........M.k.........k.........k.....k.....M............ Cpf1_Cr; CDKN2AagctcctcagccaggtccacgggcaGACGACggccccaggcatcgcgcacgtccag SEQ_DUP;.........................111222......................... DUP_NUM;.....A...aA.....aaA..a..a..aA.....aA...Ja.A...A....A..a. Cas9_Wa;..Aaa..A.aa.a..Aa....aa.a...A...A..A..Aaaa...A.Ca.a.a.a. Cas9_Cr;.......................................L................ Cpf1_Wa;........................................................ Cpf1_Cr; USH2ActgtgccaaagggtggacccgcgggTGGCTGGCtgccagggcaacggcaatgtgattg SEQ_DUP;.........................11112222......................... DUP_NUM;A....Haaa.aa....aHaaa.aA..aA..A...aaA....aA...Ia.aC..aaA.. Cas9_Wa;..Aa.a....Aa.J........AaaHa......A...A..Aa....A..A..A..... Cas9_Cr;......N................................................... Cpf1_Wa;.....................N.....N................N.......N..... Cpf1_Cr; TPOgcgggccctgcttcctggccggagaCGGCCGGCcgcgccagcgaggtcccctccctga SEQ_DUP;.........................11112222......................... DUP_NUM;A....A......aA..aa.a..aA..aA..a.A...a.a.aA..........aJ.aA. Cas9_Wa;..AaaHa...Aaa..A.Baa...Aa.....A..Aa..Aa.a.aa..A.....aaaa.a Cas9_Cr;.........L........................NL...................... Cpf1_Wa;.......................................................... Cpf1_Cr;HADHAcaaatccttcctcttcacaccctccTGATTGATagatgtaaaagcccttcccagattt SEQ_DUP;.........................11112222......................... DUP_NUM;.........J.........I..aC..aC..aC.A.....A......I..aC.....a. Cas9_Wa;....BaH..CaaHbaa.a.Ba.a.aaaHaa...................Aaa.Baaa. Cas9_Cr;........M.L..........NL.......NL....NL.............N...... Cpf1_Wa;..........M.......k.........................k........N..k. Cpf1_Cr;HADHBtgggccactctgcagaccgactggcCGCTCGCTgcctttgctgtttctcggctggaac SEQ_DUP;.........................11112222......................... DUP_NUM;.......A..a...a...aA..A...A..A.....A..A......aA..aaB..HaaC Cas9_Wa;..Aa.....Aa.a.aH.A...Aa..A...Aa.a.a.a..Aa....A....Ba.a..A. Cas9_Cr;....N...L................................................. Cpf1_Wa;................................LLN.......LN.............. Cpf1_Cr; MSH6tttttttggagatgattttattcctAATGAATGacattctaataggctgtgaggaaga SEQ_DUP;.........................11112222......................... DUP_NUM;...aa.aC.aC............HaB..a...........aA..a.a.aaB.a.aa.A Cas9_Wa;....A.....................Baa..........A..BaH......A...... Cas9_Cr;........N....N..............k...........k..NL............. Cpf1_Wa;....N..LN..........N............LN..........Lk.......k.... Cpf1_Cr; MSH6tttaaggtgaaagtacattttttgtTGAATGAAttaagtgaaactgccagcatactca SEQ_DUP;.........................11112222......................... DUP_NUM;.aa.ab..A..........A.HaB.HaB.....a.aB....A..JA.........A.. Cas9_Wa;....................A..J........................A..Aa..A.. Cas9_Cr;..NL...k................k...................k..N........N. Cpf1_Wa;.....LN..LN...N...Lk..........M.............N............. Cpf1_Cr; MSH6ttactttaacaggaagaggtactgcAACAAACAtttgatgggacggcaatagcaaatg SEQ_DUP;.........................11112222......................... DUP_NUM;.......aaB.a.aA....A............aC.aaa..aA.....A.....A..A. Cas9_Wa;........A.....A...........A..a..A...A...........A..A.....A Cas9_Cr;....N...NL..N.M........k....k............................k Cpf1_Wa;....N..k...........L......N.....k............k...N........ Cpf1_Cr; MSH6cattattttcaactcactaccattcATTAATTAgtagaagattattctcaaaatgttg SEQ_DUP;.........................11112222......................... DUP_NUM;.............................A..aB.aC.............A..A..a. Cas9_Wa;.a...A..J....Ba..A.aHa..Aa..BaH...................BaHa.... Cas9_Cr;..........M..............N....kL.............NL..N...N.... Cpf1_Wa;.......N........N...........k.......................M..... Cpf1_Cr; MSH6aaaagcaagagaatttgagaagatgAATCAATCagtcactacgattatttcggtaact SEQ_DUP;.........................11112222......................... DUP_NUM;A...aHaB....a.aB.aCHaB........A....I..aC.......aA......... Cas9_Wa;A.........A.....................Ca..ca..Ia.a..A.......Ba.. Cas9_Cr;....N..NL............................k.................... Cpf1_Wa;...LN...N........................N...N...N...M.N..LLN...M. Cpf1_Cr; MSH6aagcaagagaatttgagaagatgaaTCAGTCAGtcactacgattatttcggtaactaa SEQ_DUP;.........................11112222......................... DUP_NUM;..aHaB....a.aB.aCHaB....A...A....I..aC.......aA........... Cas9_Wa;........A.....................Ca..Ia...a.a..A.......Ba.... Cas9_Cr;..N..NL............................k...................... Cpf1_Wa;.LN............................N...N...N...M.N..LLN...M.N. Cpf1_Cr;ALMS1tgacctgtcatgtatggcaacagatAGTAAGTAtatcaaggcaatagtagaacacaaa SEQ_DUP;.........................11112222......................... DUP_NUM;..A....A...aA.....aC..A...A........aA.....A.JaB..........J Cas9_Wa;Baa.....Aa...a........a..A..............Ca....A..........A Cas9_Cr;..k.....kL....N..NLL..N.............M..................M.M Cpf1_Wa;..M....N...M....N....N.......N....k........M....N......... Cpf1_Cr;DGUOKcagtgctggtgttggatgtcaatgaTGATTGATttttctgaggaagtaaccaaacaag SEQ_DUP;.........................11112222......................... DUP_NUM;A..aa.A.HaaC.A.....aC.ac..aC.......a.aaB.A...........AB.a. Cas9_Wa;A..Baa....A..J.....J....a................Ba...........Aa.. Cas9_Cr;..k.................NLL...........N................N...... Cpf1_Wa;.....................LN...N...k...N..N...................L Cpf1_Cr; ORC4cggcagtcataaatgggtgcgatgcTGTTTGTTactcgatttaaagcaagcatctagg SEQ_DUP;.........................11112222......................... DUP_NUM;.A.......Haaa.a.aC.A..A...A...I..aC......A...A......aaaB.. Cas9_Wa;.Ca..A..A...a...........A..J.A.........A.aH........A...A.C Cas9_Cr;N....N...NLL..N....................................k..N... Cpf1_Wa;.....................k....N.........LLk............k.....N Cpf1_Cr; NEBtcccttgcccatgttttctttgtatAACAAACAcctgtgcgataagaaagcatccaga SEQ_DUP;.........................11112222......................... DUP_NUM;..A.....A........A.....J........a.a.aC...aB..A......aB.... Cas9_Wa;.....Baaa...Aaa......Ba.........a...A.aa....A..J.......A.C Cas9_Cr;......................NLL..N..........k...k...M........... Cpf1_Wa;..M.N..k............M.N..k..........k.....N.........N..... Cpf1_Cr; NEBactctctgtatctctggggtgtccaAAACAAACagtctcataatacgacatggacttc SEQ_DUP;.........................11112222......................... DUP_NUM;...A.......aaaa.A.............A....J......a....aa........A Cas9_Wa;aaa...A.aHa....Ca.a........aaJ...A...A...a.a......A..A.... Cas9_Cr;.....N....k...N...L.............M............L............ Cpf1_Wa;...k....k.........M.N.........L........................... Cpf1_Cr; TTNaatgctaatggcattcaaaacaatgGTATGTATcccctgctttaattgttagcccatc SEQ_DUP;.........................11112222......................... DUP_NUM;.....aA.............aA...A........A........A...A.......... Cas9_Wa;Aa.......A......A..BaH...A..........JCaaaa..A............A Cas9_Cr;....k..........k....................NL............M...M.L. Cpf1_Wa;N.....................N........................L....N..... Cpf1_Cr;BARD1gaagctttactcacaacatatctgaCTTTCTTTcttacttcgagggctaaaccacatt SEQ_DUP;.........................11112222......................... DUP_NUM;...................a.................a.aaA................ Cas9_Wa;.A.a.....A....A.aHa..A...Ca...A..Ba..Ba...A.Ba.....A....Aa Cas9_Cr;k..N............k............k............M.......k...k..N Cpf1_Wa;...........................k...........N...LN............. Cpf1_Cr;UGT1A8tcattcagatcacatgaccttcctgCAGCCAGCgggtgaagaacatgctcattgcctt SEQ_DUP;.........................11112222......................... DUP_NUM;...aC......a........A..A...aHaaa.aB.aB....A......A........ Cas9_Wa;.aa.a.a..BaH..Ca.a....Aa.Baa..A..Aa..A..........A...A.a... Cas9_Cr;..........................NL..............NL.............. Cpf1_Wa;................LN..N.......................N............. Cpf1_Cr; TEX1gatgtcctggccctgctcagcatctGTCAGTCAgtggagaccacaggccctgctgcgg SEQ_DUP;.........................11112222......................... DUP_NUM;....aA....A....A.....A...A...a.aa.a......aA....A..a.aaHaaa Cas9_Wa;..........aa...Aaa..A.a..A.Ca...a...a........Aa.a...Aaa..A Cas9_Cr;.......................................................... Cpf1_Wa;..............L...........................L............... Cpf1_Cr;COL7A1cagcctccgacacacgacccacaggcTCAGTCAGgggctggggacagaggcaaggtaag SEQ_DUP;.........................11112222......................... DUP_NUM;.J.a......a.......aA....A...aaaA..aaaa...a.aA...aA...aaaA. Cas9_Wa;.Ca..A..Aa.a..A.a.a..AaaHa...A.a...a.....A......A.....A... Cas9_Cr;............N............................................. Cpf1_Wa;...................L.........N....N................N...... Cpf1_Cr;IQCB1gaaaacccttccaataggcttgaatCAAGCAAGcatgctgcttgatgtagtttctgaa SEQ_DUP;.........................11112222......................... DUP_NUM;............aA..HaB.....A...A...A..A...aC.A..A.....aB...A. Cas9_Wa;...a......AaaHBaa......A.....Ca..IA...A...A..A...........B Cas9_Cr;.N...k........k..N.............NLL........N............... Cpf1_Wa;LN...N...N........................Lk......Lk....N........M Cpf1_Cr; DOK7acagatgaactgggctcactgctcaGCCTGCCTgccagcagcgggggcccccgagccc SEQ_DUP;.........................11112222......................... DUP_NUM;C.aB...aaa......A....A...A...A...A..a.aaaaA.....a.A....A.. Cas9_Wa;...Aa.a.......A....A.a.a..A.a..Aa..Aa..Aa..A..A.....Aaaaa. Cas9_Cr;.......................................................... Cpf1_Wa;.......................................................... Cpf1_Cr; WFS1gccatcatggagatcaaggagtaccTGATTGATtgacatggcctccagggcaggcatg SEQ_DUP;.........................11112222......................... DUP_NUM;....aa.aC....aa.A..I..ac..aC..a....aA......aaA..aAJ..A.... Cas9_Wa;..A.a.aa.Ca.......Ca........Aa.J.........A....Aa.aa....A.. Cas9_Cr;...................................................N...N.. Cpf1_Wa;.......................................................... Cpf1_Cr;CC2D2ActctttattaccattgagccccagcTGGTTGGTtcctggagagtccattcgagaaaag SEQ_DUP;.........................11112222......................... DUP_NUM;...........a.A.....A..aA..aA.....aaHa.A.......a.aB...aa... Cas9_Wa;...AaaHa.......Aa......aaaa..A........Baa........aa..BaH.. Cas9_Cr;..................N........k..N.....N..............N...NL. Cpf1_Wa;................L...............k......N.......M......LLk. Cpf1_Cr;SPINK5gatgggaaaacatatgacaacagatGTGCGTGCactgtgtgctgagaatgcgtgagta SEQ_DUP;.........................11112222......................... DUP_NUM;aaB........a......aC.a.aJa.A....a.a.A..aHaN..a.aHa.A...... Cas9_Wa;.A.a...........A......A..A.......A..JA.aJ.....AJ.J.....A.. Cas9_Cr;.N.M.........k......................M..................... Cpf1_Wa;.........................N.................LN............. Cpf1_Cr;SH3TC2cactagactcacggtcaggcaggcaGGCCGGCCagcagggcacctgccttttccaaca SEQ_DUP;.........................11112222......................... DUP_NUM;.a......aA...aA..aA..aA..aA...A.JaaA.....A............aaa. Cas9_Wa;A....A.a....A.aHa...a...A...A...Aa..Aa..A....A.aa..Aa...Ba Cas9_Cr;............NL.....N...................................... Cpf1_Wa;.................................N.......L................ Cpf1_Cr;DDX41cagacatacctggttttgatggggtCATCCATCatacgtaatgcccttagccatctcc SEQ_DUP;.........................11112222......................... DUP_NUM;.......aA....aC.aaaA......J......A....A......A............ Cas9_Wa;A....A..JA...Aa...............a.CaaHCa...A......Aaa....aa. Cas9_Cr;......................................k............L...... Cpf1_Wa;............M.....N..............................N.....LN. Cpf1_Cr; CDSNaccgctggagtcacccttcccagtgAGGCAGGCaggggtcgttaggggaggtgatacg SEQ_DUP;.........................11112222......................... DUP_NUM;..aa.A............a.a.aA..aA..aaaA..A...aaaa.aa.aC...a.a.a Cas9_Wa;A.....Aa.a......a.aaaHBaaa.......A...A......a............. Cas9_Cr;.......................................NLL............... Cpf1_Wa;.........................L......M..................N...... Cpf1_Cr;LAMA2ggcataaagtcactgccaacaagatCAAACAAAcaccgcattgagctcacagtcgatg SEQ_DUP;.........................11112222......................... DUP_NUM;....A.....A.......aC.....J.......A....a.A......A..aC.aaaB. Cas9_Wa;.A.....A.......a.a..Aa..A....Ca...A...A.aa.a......A.a.a... Cas9_Cr;.......k.................................................. Cpf1_Wa;.....k...k...........................LLN.......LLN.....k.. Cpf1_Cr;SERAC1tttgacttccaacgaggggaagagaAGATAGATagcgaatattaacagagtattcagc SEQ_DUP;.........................11112222......................... DUP_NUM;.........a.aaaaB.a.aB.aC..aC..aHaB........Ha.A......A..... Cas9_Wa;...Ba.....A.Baa..A......................A.........A....... Cas9_Cr;......N.....N.......N...k....NLL.......................... Cpf1_Wa;...N..M...M....LN.....N...............k..................L Cpf1_Cr;SERAC1cgctgaggctggtgtcatactccacAGATAGATataattcggagagcaggacagtctt SEQ_DUP;.........................11112222......................... DUP_NUM;a.aA..aa.a............aC..aC........aa.a.A..aa...A......A. Cas9_Wa;aa...a.a.....A......a.J.A.aa.a.............BaH.....A....A. Cas9_Cr;...........N......L.........................L...........M. Cpf1_Wa;......M...M.M.N....L......................N.........N...k. Cpf1_Cr;SLC26A4acacagccttctctgtctctcttggCAGTCAGTcggtcttggcagctgttgtaattgc SEQ_DUP;.........................11112222......................... DUP_NUM;.A........A........aA..A...A..aA....aA..A..A..A.....A..... Cas9_Wa;......A.a..Aa.Ba.a...a.a.a....A...a...a...a....A..A....... Cas9_Cr;............NLL................N............N............. Cpf1_Wa;...............................N......N....Lk...L......... Cpf1_Cr; CFTRcacttcttggtactcctgtcctgaaAGATAGATattaatttcaagatagaaagaggac SEQ_DUP;.........................11112222......................... DUP_NUM;..J.aA.......A....aB..aC..aC............aC..aB..a.aa...A.. Cas9_Wa;..Ba.a.a.Ba......A.aa...aa...................Ba........... Cas9_Cr;.......N..NL.......k......N..N...........................N Cpf1_Wa;.Lk...M...M....N.....N..M...k.........................L... Cpf1_Cr; RP1aaatgattggacagttttcatatagTGAATGAAgaaagggaaagtggggaaaacaagt SEQ_DUP;.........................11112222......................... DUP_NUM;aC..aa...a..........aHaB..aB.aB..aaaB..a.aaaaB......A..Ha. Cas9_Wa;................a.....Ba.................................. Cas9_Cr;M.............NL.............N.........kL...M............. Cpf1_Wa;M....LN..LN..k...LLk......LLK....n........................ Cpf1_Cr; GLDCataagcccaggaaatgggcaagatgGAACGAACtggagccccatggggccgcactgac SEQ_DUP;.........................11112222......................... DUP_NUM;A....aaB...aaA...aC.aaB..aB...aa.A......aaaa..A....a..aA.. Cas9_Wa;..........Aaa..........A.........A...A.....Aaaa......Aa.a. Cas9_Cr;.....L............N....M.................................. Cpf1_Wa;...LLN..LN...L............................................ Cpf1_Cr; GNEctgagatacgtacctagccacatgcGAATGAATgatgctcatgtagtctttgttcttg SEQ_DUP;.........................11112222......................... DUP_NUM;aCJ..A......A......aHaB.HaB..aC.A.....A..A.....A.....aA..a Cas9_Wa;.....a.......A...Aa.J.Aa.a...A............A.a.......a..... Cas9_Cr;.......N.................................................. Cpf1_Wa;....LN..LN...................................N........M... Cpf1_Cr;CYP17A1gagtcgatcagaaagaccaccttggGGATGGATgccttccagggagggcagctgccca SEQ_DUP;.........................11112222......................... DUP_NUM;.aC...aB..a........aaaaCHaaC.A.......aaa.aaA..A..A........ Cas9_Wa;.........a..Ca.......Aa.aa.............Aa.Baa........A..A. Cas9_Cr;.....N.....NL....k..........................N............. Cpf1_Wa;....L...L............L...................................L Cpf1_Cr; ATMactacacaaagagaatctagtgattACAGACAGtgtcccttgcaaaaggaagaaaata SEQ_DUP;.........................11112222......................... DUP_NUM;......aHaB....Ia.aC.....a...a.A......A.....aaB.aB.....aB.. Cas9_Wa;IA.a..A..A.a........Ca..I......A...A.....aaa...A.......... Cas9_Cr;.......k........N.............................N........... Cpf1_Wa;......................k....LN..k......N.........LLN...N..k Cpf1_Cr;FOXRED1aagtttccctggataaacacagaggGAGTGAGTggctttggcgtcttatggtgaggct SEQ_DUP;.........................11112222......................... DUP_NUM;.....HaaCJ.......a.aaa.aHa.a.aA....aa.A......aa.a.aA...A.. Cas9_Wa;....A.....Baaa........A.a...............A.....A..a........ Cas9_Cr;...............NL..........kLL............................ Cpf1_Wa;...L.............................................L........ Cpf1_Cr;C12orf65ggctttgggagaagctgacgttgttATCCATCCccaggaatagctgtcactccggtcc SEQ_DUP;.........................11112222......................... DUP_NUM;..aaa.aB.A..a..A..A.............aaB...A..A.......aA....aC. Cas9_Wa;.a.aa..A...........A...A.......CaaHCaaaa........A...a.a.aa Cas9_Cr;M....................L.....k...............N..N.M.L...L... Cpf1_Wa;................LN..............................N..N...... Cpf1_Cr; GJB2tccacagtgttgggacaaggccaggCGTTCGTTgcacttcaccagccgctgcatggag SEQ_DUP;.........................11112222......................... DUP_NUM;..a.A..aaa....aA...aa.A...a..A..........A..A..A...aa.aB.A. Cas9_Wa;..A...aa.a.......J..A....Aa...A..Ba....A.a.Ba.aa..Aa.a..A. Cas9_Cr;L.......................L.......N.................NL..N... Cpf1_Wa;.................................L...N................M... Cpf1_Cr;BRCA2ctgaaaatgaagataacaaatatacTGCTTGCTgccagtagaaattctcataacttag SEQ_DUP;.........................11112222......................... DUP_NUM;....aB.aC.....J.......A...A..A...A..aB..............HaB... Cas9_Wa;.....A...............A.......A..a...A..Aa.........BaHa.... Cas9_Cr;.........N..................................M......N...... Cpf1_Wa;M...................k.......M.N......N..................N. Cpf1_Cr;BRCA2ttgtttctccggctgcacagaaggcATTTATTTcagccaccaaggagttgtggcacca SEQ_DUP;.........................11112222......................... DUP_NUM;......aa..A....aB.aa...........A.......aa.A..a.aA...J..... Cas9_Wa;..A..J....Ba.aa..A..A.a......A.......Ba..Aa.aa............ Cas9_Cr;.....N....L....k.......k...k...L..................k...kL.. Cpf1_Wa;....................N...............k.....Lk.......M.k...k Cpf1_Cr;EIF2AK4tcctcagcagctcggtggagtggagCACTCACTtcgggcgagcgctcggccagtgccc SEQ_DUP;.........................11112222......................... DUP_NUM;..a..A...aa.aa.a.aa.A..........aaa.a.a.A...aA...a.A...A... Cas9_Wa;.aaaHCaaHa..A..A.a............A.a.aHa.Ba...A...A.a.a..Aa.. Cas9_Cr;.......................................................NL. Cpf1_Wa;L......................................................... Cpf1_Cr; HEXAgaaatccttccagtcagggccatagGATAGATAtacggttcaggtaccagggggcaga SEQ_DUP;.........................11112222......................... DUP_NUM;........A...aaA....GaaC..aC.....aA..J.aA.....aaaaA..a.a.aB Cas9_Wa;.........CaaHbaa...a....Aa..............A...Ba.....Aa.J... Cas9_Cr;...........NLL....M...........NLL......................M.. Cpf1_Wa;M....M...M.M............................N.......LLN....... Cpf1_Cr;PEX12acagaaaggccagtagacagggataAGGCAGGCaacacccccaacagctttcttcaga SEQ_DUP;.........................11112222......................... DUP_NUM;B..aa...a..a...aaaC...aA..aa..............a.........a.A.Ha Cas9_Wa;A.aaaHa.......Aa......A..........A...A..A.aaaaa..A..A..Ba. Cas9_Cr;.......................................................... Cpf1_Wa;.M.N........N........N...................................N Cpf1_Cr; MKS1agacagtgcaagcggaaggtgacagTGCCTGCCtgtggtctctgtgcggagtccaaag SEQ_DUP;.........................11112222......................... DUP_NUM;.a.A...a.aaB.aa.a...a.A...A...a.aA.....a.a.aa.A......a.aA. Cas9_Wa;.A..A...A....A..JA.........A....Aa.JAa......a.a....A..J..a Cas9_Cr;.......................................................... Cpf1_Wa;..........................L......k........................ Cpf1_Cr; NPC1tactcacggagctgcccatgtgggcAAGTAAGTgcctcttccgcgcgctccacgcggc SEQ_DUP;.........................11112222......................... DUP_NUM;...aa.A..A.....a.aaA...A...a.A........a.a.A......a.aA..A.. Cas9_Wa;a......A.aha....A..Aaa.......A.........Aa.a.Baa.a.a.a.aa.a Cas9_Cr;.......................................................... Cpf1_Wa;....N..N................................................LN Cpf1_Cr;PNPLA6agcgttgtacgcggaggagcgcagcGCCAGCCAgccgcacgaagcagcgggcccggga SEQ_DUP;.........................11112222......................... DUP_NUM;..A...a.aa.aa.a.A..a.A...A...a..A...aB.A..a.aaA...aaa.a.aa Cas9_Wa;.Ca....A......A.aJ......A.a..A.aa..Aa..Aa.a.a....A..A...Aa Cas9_Cr;....L..N.......kL..........N.............................. Cpf1_Wa;...................LN.............L.........N............. Cpf1_Cr; LDLRgcagccagctctgcgtgaacctggaGGGTGGGTggctacaagtgccagtgtgaggaag SEQ_DUP;.........................11112222......................... DUP_NUM;...A....a.a.aB....aaHaaaHaaa.aA......a.A...a.a.a.aaB.aA... Cas9_Wa;A.aa..A..aa..A.a..A.....Aa..............A..A.....Aa.J..... Cas9_Cr;.............L............................................ Cpf1_Wa;.L................N..............LN............L.......... Cpf1_Cr; FKFPggcccccgtgtcaccgtcctggtgcGGGAGGGAgttcgaggcatttgacaacgcggtg SEQ_DUP;.........................11112222......................... DUP_NUM;...a.A.....A....aa.a.aaa.aaa.A...a.aA.....a.....a.aa.A...a Cas9_Wa;A..Aa..Aaaaa....a.aa..aa.....A..J.......Ba....A......A..a. Cas9_Cr;.........................................................N Cpf1_Wa;.....L...L..................N............................. Cpf1_Cr;CHEK2atttaccttccaagagtttttgacaTGATTGATgtattcatctcttaatgccttagga SEQ_DUP;.........................11112222......................... DUP_NUM;........Ha.A.....a.I..aC..aC.A...............A....HaaC.... Cas9_Wa;..........Aa.Baa............A.............BahCa.a......Aa. Cas9_Cr;........k.M.M............k....NLL.........k........N...... Cpf1_Wa;.........................N..........M.k................... Cpf1_Cr;NPHP4aactcggcgacccccagcgtggcgtGGAGCGGAGCgtgtgctccgtggtgatggccaggc SEQ_DUP;.........................1111122222......................... DUP_NUM;.aa.a.......a.a.aa.a.aa.a.aa.a.a.a.A....a.aa.aC.aA...aA....A Cas9_Wa;aH.....A.aH.A..Aaaaa..A....A......A....A.....AJaa..........A Cas9_Cr;..N..N...................................................... Cpf1_Wa;....L....L.................................................. Cpf1_Cr;IDUAcaggcttcctgaactactacgatgcCTGCTCTGCTcggagggtctgcgcgccgccagccc SEQ_DUP;.........................1111122222......................... DUP_NUM;......aB........aC.A...A....A...aaHaaA...a.a.A..A...A....A.. Cas9_Wa;a..Aaa...A.Baa....A..A..A....Aa..A.a..a.a.......a..A.a.aa.aa Cas9_Cr;..............L.............NL.............................. Cpf1_Wa;...............L....................................L....... Cpf1_Cr;CCNOaggtctgtagatctagctgcgccacGGGCTGGGCTgggccgggccgggcagggggctacc SEQ_DUP;.........................1111122222......................... DUP_NUM;..A..aC....A..a.A....aaA..aaA..aaA..aaA..aaA..aaaaA......... Cas9_Wa;.a.......a......Ca...A..A.aa.a...A....A....Aa...Aa...A...... Cas9_Cr;............................................................ Cpf1_Wa;............................................................ Cpf1_Cr;CCNOgctgcgccacgggctgggccgggccGGGCAGGGCAgggggctaccaccccgcgccgcaga SEQ_DUP;.........................1111122222......................... DUP_NUM;.A....aaA..aaA..aaA..aaA..aaA..aaaaA..........a.A..A..a.aaaA Cas9_Wa;.Ca...a..A.aa.a...A....Aa...Aa...A....A......A..Aa.aaaa.a.aa Cas9_Cr;............................................................ Cpf1_Wa;............................................................ Cpf1_Cr;MYBPC3ctcatgcccttgagcctctttagcaTGCCGTGCCGcgcaggtcagtgacgccgtactgga SEQ_DUP;.........................1111122222......................... DUP_NUM;.A.....a.A........A...A..a.A..a.A..aA...a.a..a..A....aaB.aa. Cas9_Wa;a.a.aa.a...Aaa.....Aa.a.....A...Aa...Aa.a.a....a.....A.aa... Cas9_Cr;........N..N....................N.........k................. Cpf1_Wa;....................................LLN..................... Cpf1_Cr;CHRNEagatgagggtgggggtagcttaccaGTGAGGTGAGatgagattcgtcagggtgaccttga SEQ_DUP;.........................1111122222......................... DUP_NUM;aHaaa.aaaaA..A.......a.a.aa.a.aCIa.aC...A..Haaa.a.....a.aA.. Cas9_Wa;.A.....................A...Aa..................BaH.a.......A Cas9_Cr;...............k....N.....................N................. Cpf1_Wa;............................................................ Cpf1_Cr;BRCA1attgtgctcactgtacttggaatgtTCTCATCTCAtttcccatttctctttcaggtgaca SEQ_DUP;.........................1111122222......................... DUP_NUM;.A...J..A.....aaB..A.............................aa.a....HaB Cas9_Wa;..A........A.aJa....A..J......Ba.a.Ca.a...Baaa...Ba.a..Ba... Cas9_Cr;.k.........N..N.........N..............N.......N...........k Cpf1_Wa;.........................................LN.........k....... Cpf1_Cr;SGCAtacaatcgggacagctttgataccaCTCGGCTCGGcagaggctggtgctggagattgggg SEQ_DUP;.........................1111122222......................... DUP_NUM;...aaa...A..J.aC........aA...aA..a.aA..aa.A..aa.aC..aaaa.... Cas9_Wa;a..Aa..A..Ca..I.A..A.......Aa.a.aH.A.a..A.....A.....A..J.... Cas9_Cr;kLL....L...............................k.................... Cpf1_Wa;............................L........L......N............... Cpf1_Cr;PKNPaggaagcagcggcaggggacgcccgCGGCTCGGCTcgggcacactggacgtacctgtggg SEQ_DUP;.........................1111122222......................... DUP_NUM;.A..a.aA..aaaa..A...a.aA...aA..JaaA......aa..A.....a.aaaaaB. Cas9_Wa;...........A..A..a......A.aaa.a..A.a..A.a...A.a.a....A...Aa. Cas9_Cr;......L..................................................... Cpf1_Wa;........................L..............LLN.......L.......... Cpf1_Cr;DNAAF3gtcccagtcgctgacaccgcgccggGCGTCGCGTCgtagcgggagcccaggtagtggcgc SEQ_DUP;.........................1111122222......................... DUP_NUM;..A..AJ.a.....a.A..aaa.A..a.A..A..a.aaa.A....aA..a.aa.A..aa. Cas9_Wa;A.a....aaa...a.a...A.aa.a.aa...A..a.a..a....A.....Aaa....... Cas9_Cr;............NL...........L.................................. Cpf1_Wa;....................L....................................... Cpf1_Cr;MUTYHcccttcctcccctggagtcacctgcATCCATATCCATccggtatagtagttgatcacagtgg SEQ_DUP;.........................111111222222......................... DUP_NUM;.........aa.A......A...............aA....A..A..aC.....a.aA.... Cas9_Wa;AaaHBaaa.Baa.aaaa......a.aa..A.CaaH..CaaHCaaH......J......Ca.a Cas9_Cr;.....................NLL..NL...L..................L...M.L...L. Cpf1_Wa;........M............M....................N................... Cpf1_Cr;SLC22A5aggatgaccatatcagtgggctattTTGGGCTTGGGCtttcgcttgatactcctaacttgca SEQ_DUP;.........................111111222222......................... DUP_NUM;.a.........a.aaA.......aaA...aaA.....A.J.aC............A...aaa Cas9_Wa;...Ba.......Aa...Ca......A.........A.....A..Ba.a......A.aa...A Cas9_Cr;....k..N............kL............M.............k.....N......k Cpf1_Wa;.........................M.......N..........L...........LN.... Cpf1_Cr;FOXC1agcagcagctcgtcgtccctgagtcACGGCGACGGCGgcggcggcggcggcggcgggggagg SEQ_DUP;.........................111111222222......................... DUP_NUM;A..A...A..A....Ha.A....aa.a..aa.aa.aa.aa.aa.aa.aa.aaaaa.aA...a Cas9_Wa;.a.aa..A..A..A.a..a..aaa.....a.a..A..A..A..A..A..A..A..A..A... Cas9_Cr;..............L........................L...................... Cpf1_Wa;....................................L......................... Cpf1_Cr;RMRPttcagcacgaaccacgtcctcagctTCACAGATCACAGAgtagtattttatagccctaaagaaa SEQ_DUP;.........................11111112222222......................... DUP_NUM;A...aB.....A......A.......aC....Ha.A..A.........A.......aB....a. Cas9_Wa;...Aa.Ba..A.a...Aa.a..aa.a..A.Ba.a...Ca.a............J....Aaa... Cas9_Cr;.......................NL......................NL............... Cpf1_Wa;............................M.......k...k....................... Cpf1_Cr;CHRNRggtcccctgccggtgcctctgccccTCAAACATCAAACAcgagctcgctccgtggctttttcag SEQ_DUP;.........................11111112222222......................... DUP_NUM;....A..aa.A.....A............J......a.A...A....a.aA........A.... Cas9_Wa;H.aa....aaaa..Aa....Aa.a..Aaaa.a...A.Ca...A.a...A.a.a.aa....A... Cas9_Cr;....................L.....L..................................... Cpf1_Wa;......k......k..............................................L... Cpf1_Cr;RBCK1ccaggtccccgcctcataccagcccGACGAGGGACGAGGaggagcgagcgcgcctggcgggcga SEQ_DUP;.........................11111112222222......................... DUP_NUM;A.....a.J........A...a..a.aaa..a.aa.aa.a.a.a.a.A...aa.aaa.a.aa.a Cas9_Wa;.Aa..Aa....aaaa.aa.a...Aa..Aaa..A......A.........A...A.a.aa...a. Cas9_Cr;.............................L.................................. Cpf1_Wa;..........L..................................................... Cpf1_Cr;ABHD12ctgtacttgccactgaaaatggatgGCTCTTAGCTCTTAgcttcttcgcggatattagtgaatg SEQ_DUP;.........................11111112222222......................... DUP_NUM;....A.....aB...HaaC.aA......A......A.......aHaaC.....aHaB..aa.aC Cas9_Wa;....Ca..I.A..JHa.a.............A.a....A.a....a.ba.Ba.a.......... Cas9_Cr;......n......k...............N......................N......N.... Cpf1_Wa;............................L.M.......LN..L........N..M.N....... Cpf1_Cr;MSH6tggctttaatgcagcaaggcttgctAATCTCCCAATCTCCCagaggaagttattcaaaagggacat SEQ_DUP;.........................1111111122222222......................... DUP_NUM;......A..A...aA...A...................a.aaB.A..........aaa.....aB. Cas9_Wa;..A.....A.......A..A....A...A...Ca.aaa..Ca.aaa............BaH..... Cas9_Cr;........N.............M.....k..............N.........L.......L.... Cpf1_Wa;....N.......N...........LN........k....L...M...k.....N....N....... Cpf1_Cr;LAMA2ttgaagaagaggaagaagatacagaACGTGTTCACGTGTTCtccagcttatgattatcttagaggt SEQ_DUP;.........................1111111122222222......................... DUP_NUM;.aB.a.aaB.ab.aC....aB..a.A.....a.A.......A..I..aC........a.aA.ha.A Cas9_Wa;...Aa.....................A....A....BaJa....BaJaa..A.........Ca... Cas9_Cr;.....k......M....M.....N.............................NL......N...L Cpf1_Wa;...N.............................................................. Cpf1_Cr;TCTN2agacgtcaatcctccttttgatcagCTCtGCTCCTCTGCTCtgctgggacgacgacacgtggtgtc SEQ_DUP;.........................1111111122222222......................... DUP_NUM;A..............aC...A....A.......A....a..aaa..aJ.a....a.aa.A..I..a Cas9_Wa;J.......A..a..CaaHaa......Ca..A.a..A.aa.a..A.a..A.....A..A..A.a... Cas9_Cr;...NL...........k.......................k......................... Cpf1_Wa;.........................L........................................ Cpf1_Cr;BRCA2gaatttgacaggataatagaaaatcAAGAAAAAAAGAAAAAtccttaaaggcttcaaaaagcactc SEQ_DUP;.........................1111111122222222......................... DUP_NUM;..a..HaaC.....aB.......aB......aB............aA......J..A.......aC Cas9_Wa;.............A..............Ca..I.............CaaHI.....A.Ba...... Cas9_Cr;..............N..N.........k...................................... Cpf1_Wa;....N..k.......k.........k........k..................k............ Cpf1_Cr;TCAPtcatggctacctcagagctgagctgCGAGGTGTCGAGGTGTcggaggagaactgtgagcgccggga SEQ_DUP;.........................1111111122222222......................... DUP_NUM;aA........a.A..a.A..a.a.aa.A..a.aa.A..aa.aa.aB...a.a.a.A..aaa.aA.. Cas9_Wa;.....Ca....A..Aa.a....A....A..A.......a.J.....a.J.......A......a.a Cas9_Cr;.................................................................. Cpf1_Wa;.....................L......N.............L................N..LLN. Cpf1_Cr;LDLRgatggtggccccgactgcaaggacaAATCTGACAATCTGACgaggaaaactgcggtatgggcgggg SEQ_DUP;.........................1111111122222222......................... DUP_NUM;a.aA....a...A...aa........a.......a..a.aaB.....a.aA...aaa.aaaA..Ha Cas9_Wa;.a...........Aaaa..A..A.....A...Ca..IA..Ca..IA........A..A.......J Cas9_Cr;....L....L........................................................ Cpf1_Wa;...k........N..........Lk......................................... Cpf1_Cr;ITPAagcctatgcgctctgcacgtttgcaCTCAGCACCTCAGCACcggggacccaagccagcccgtgcgc SEQ_DUP;.........................1111111122222222......................... DUP_NUM;...a.A.J..A...AJ..A...J..A....J..A....aaaa......A...A...a.a.A...A. Cas9_Wa;....a..Aa....A.a.a..A.a.....A.a.aH.A.aa.a..A.aa.....AaaH..Aa..Aaa. Cas9_Cr;........NL..................M..............k...................... Cpf1_Wa;.......................L.....N............................L....... Cpf1_Cr;PEX1tccagcaggacaacagatggctgcaTCCACACTGTCCACACTGcctctgagaaagccacctctagggt SEQ_DUP;.........................111111111222222222......................... DUP_NUM;A..aa......aC.aA..A..J.......AJ.......A.....a.aB..A........HaaA..... Cas9_Wa;...a.CaaH.A....A..A......A..A.CaaHa.a...aa.a.a..Aa.a........Aa.aa.a. Cas9_Cr;..L.....................L........................L........L......... Cpf1_Wa;..............................k...............................N..... Cpf1_Cr;IGFALSgcctgttgcccgccagcaccagctcGCGCAGGCTGCGCAGGCTgcccaggccgcggaacgccgcatcg SEQ_DUP;.........................111111111222222222......................... DUP_NUM;A..A...AJ..A.....A...a.A..aA..a.A..aA..A....aA..a.aaB..A..A....aaaa. Cas9_Wa;..Aa..Aa.....Aaa.aa..A.aa..A.a.a.a...A..A.a...A..Aaa...Aa.a....A.aa. Cas9_Cr;............................N....................................... Cpf1_Wa;.................................LLN............L................... Cpf1_Cr;SEPN1gcagccgccgccagccgcagccatgGGCCGGGCCCGGCCGGGCCCggccgggccaacgcgggccgcccag SEQ_DUP;.........................11111111112222222222......................... DUP_NUM;..A..A...A..A..A....aaA..aaA...aA..aaA...aA..aaA.....a.aaA..A....A.... Cas9_Wa;a.aa..A..Aa.aa.aa..Aa.a..Aa.....Aa...Aaa..Aa...Aaa..Aa...Aa..A.a...Aa. Cas9_Cr;....kL...NLL.......................................................... Cpf1_Wa;.................................N.................................... Cpf1_Cr;CYP1B1agcccaagacagaggtgttggcagtGGTGGCATGAGGTGGCATGAggaatagtgacaggcacaaagctgg SEQ_DUP;.........................11111111112222222222......................... DUP_NUM;...a...a.aa.A..aA..a.aa.aA...a.aa.aA...a.aaB...a.a.J.aA......A..aa.aB. Cas9_Wa;.......Aaa....A..........JA........A.........A..............A...a.a... Cas9_Cr;..........N.............................N............................. Cpf1_Wa;.........................LN..............k.....L...N.............M.k.. Cpf1_Cr;CCYP27A1gtgcaggcgcgcgagcacaacccatGGCTGCGCTGGGCTGCGCTGggctgcgcgaggctgaggtgggcgc SEQ_DUP;.........................11111111112222222222......................... DUP_NUM;.aa.a.aJa.A..........aA..a.A..aaA..a.A..aaA..a.a.a.aA..a.aa.aaa.A..a.a Cas9_Wa;aH......A..JA.a.a...A.a..AaaH...A..A.a....A..A.a....A..A.a....A....... Cas9_Cr;...................................................................... Cpf1_Wa;...................................................................... Cpf1_Cr;RMRPttcagcacgaaccacgtcctcagctTCACAGAGTATCACAGAGTAgtattttatagccctaaagaaattg SEQ_DUP;.........................11111111112222222222......................... DUP_NUM;A...aB.....A......A......Ha.A......Ha.A..a.........A.......aB....a.A.. Cas9_Wa;...Aa.Ba..A.a...Aa.a..aa.a..A.Ba.a......Ca.a............J....Aaa...... Cas9_Cr;.......................NL......................NL........M............ Cpf1_Wa;...............................M.......k...k.........................k Cpf1_Cr;SPG11catggaggcatttgcttgtcagcacTTCCAGGTTATTCCAGGTTAgttaccacttcattactggagggca SEQ_DUP;.........................11111111112222222222......................... DUP_NUM;a.aA.....A...AJ..A........aA........aA...A................aaJaaA....A. Cas9_Wa;a.Baaa.......A.....A....a..A.a.Baa.......Baa..........Aa.a.Ba....a.... Cas9_Cr;................N..NLL............k...N.........NLL....N..NLL....N...N Cpf1_Wa;.........................................L..............k............. Cpf1_Cr;SPG11gtaggagagcatggatctctgggtgCAGATCCTCCCAGATCCTCCatactagcttcccctgaggccagtg SEQ_DUP;.........................11111111112222222222......................... DUP_NUM;a.a.A..HaaC....Haaa.A..aC........aC...J........A........a.aA...a.A.... Cas9_Wa;.a............A.....Ca.a......A..JCaaHaaa...CaaHaa...A...A.Baaaa.....A Cas9_Cr;NL...................k..................................L.........L... Cpf1_Wa;........................M............................................. Cpf1_Cr;BRCA1taatgagctggcatgagtatttgtgCCACATGGCTCCACATGGCTccacatgcaagtttgaaacagaact SEQ_DUP;.........................11111111112222222222......................... DUP_NUM;a.A..aA..Ha.A.....a.A......aA........aa........A...a...aB....aB......J Cas9_Wa;..A.........A...A..........J..Aa.a....A.aa.a....A.aa.a...A..........A. Cas9_Cr;............N..............................k..............L.........L. Cpf1_Wa;................................N.....Lk.....N.........M.........L.... Cpf1_Cr;NCF4gttttcgtcatcgaggtgaagacaaAAGGAGGATCAAGGAGGATCcaagtacctcatctaccgccgctac SEQ_DUP;.........................11111111112222222222......................... DUP_NUM;..A.....a.aa.aB.a......aaHaaC....aaHaaC..J..A.............A..A.....A.. Cas9_Wa;a........Ba..a.Ca..........A..........Ca........CaaH....Aa.a.Ca..Aa.aa Cas9_Cr;...................N......kL.......................................L.. Cpf1_Wa;..k...........N..........N............................................ Cpf1_Cr;SLC22A5gaggtgccccacagctgccgccgctACCGGCTCGCCACCGGCTCGCCaccatcgccaacttctcggcgcttg SEQ_DUP;.........................1111111111122222222222......................... DUP_NUM;.A.......A..A..A..A.....aA...A.....aA...A........A..........aa.A...aaa.. Cas9_Wa;.Aa.a......Aaaa.a..A..aa.aa.a..Aa..A.a.aa.aa..a.a.aa.aa.Ca.aa..A.Ba.a..a Cas9_Cr;........................................................................ Cpf1_Wa;...................................N...................L................ Cpf1_Cr;KCNQ1gtggtgttcttcgggacggagtacgTGGTCCGCCTCTGGTCCGCCTCtggtccgccggctgccgcagcaagt SEQ_DUP;.........................1111111111122222222222......................... DUP_NUM;.A......aaa..aa.A...a.aA...A.....aA...a.....aa...a..aa..A..A..AJ..A...a. Cas9_Wa;..A..J......BaJBa....A......A..J..aa.aa.a....aa.aa.a....aa.aa..A..Aa.a.. Cas9_Cr;...L.........................N..NL..................L..........L........ Cpf1_Wa;...............................................N........................ Cpf1_Cr;MYO7AgctgtgaacccctaccagctgctctCCATCTACTCGCCATCTACTCGccagagcacatccgccagtatacca SEQ_DUP;.........................1111111111122222222222......................... DUP_NUM;.aB..........A..a..............A..........A..Ja.a.......a..JA........... Cas9_Wa;H.....A......Aaaa..Aa..A..A.a.aa.Ca..A.aHaa.Ca..A.aHaa....A.a.CaaHaa.... Cas9_Cr;...M........L...................................L....................... Cpf1_Wa;.............................................M....N..N..N.......L....... Cpf1_Cr;CEP57ccacaagccctagccatgccgtggtAGCCAATGTTCAGCCAATGTTCagcttgtcttgcatctaatgaagca SEQ_DUP;.........................1111111111122222222222......................... DUP_NUM;..A.....A....A..a.aA..A.....A....A.....A....A...A....A........aB.a...... Cas9_Wa;.....aa.a...Aaa...Aa...Aa.......Aa.....Ba..Aa.....Ba..A....a...A.Ca..... Cas9_Cr;kL........k..NL...k....L................................NL.........NL... Cpf1_Wa;........N..........N......................N..LN...N.......k...........N. Cpf1_Cr;ACADMgaaatggcaatgaaagttgaactagCTAGAATGAGTTACTAGAATGAGTTAccagagagcagcttgggaggttgat SEQ_DUP;.........................11111111111112222222222222......................... DUP_NUM;.aA....aB..A..aB....A..HaB.Ha.A.....HaB.Ha.A......a.a.A..A...aaa.aa..aC....a Cas9_Wa;...A........A.............A...A............A............Aa......A..A........ Cas9_Cr;.......M....k.M........................N..................N............N.... Cpf1_Wa;........N............N.......................L...................Lk......... Cpf1_Cr;GAMTgaggggcaccttgtgtgtctgccgtGGGGCCCAGTCCCGGGGCCCAGTCCCggagccgctggaagacgccgtcatt SEQ_DUP;.........................11111111111112222222222222......................... DUP_NUM;aA......a.a.A...A..a.aaaA....A....aaaA....A....aa.A..A..aaB.a..A..A..J..A... Cas9_Wa;...A.......A.aa........a.JAa......Aaa...aaa....Aaa...aaa....Aa.a.......A.aa. Cas9_Cr;.................................N........................L............L.... Cpf1_Wa;..............................L........LLN...........................N...... Cpf1_Cr;CHRNEccacctcttcggcattgtacgtctgAGAGCTGCGGAGCCAGAGCTGCGGAGCCagggccgggagcccaccccagaagc SEQ_DUP;.........................1111111111111122222222222222......................... DUP_NUM;......aA.J..A...A...a.a.A..a.aa.A...a.A..a.aa.A...aaA..aaa.A.........aB.A....a Cas9_Wa;...A..aa.aa.a.Ba..A.....A..a......A..A....Aa....A..A....Aa....Aa.....Aaa.aaaa. Cas9_Cr;.N.....................L......NL.....N........................................ Cpf1_Wa;............L.............L............L.............N........................ Cpf1_Cr;RMPRgccttcagcacgaaccacgtcctcaGCTTCACAGAGTAGTGCTTCACAGAGTAGTattttatagccctaaagaaattgtg SEQ_DUP;.........................111111111111111222222222222222......................... DUP_NUM;J..A...aB.....A......A......Ha.A..a.A......Ha.A..A.........A.......aB....a.A...I Cas9_Wa;..A...Aa.Ba..A.a...Aa.a..aa.a..A.Ba.a.........A.Ba.a............J....Aaa........ Cas9_Cr;..........................NL......................NL.............NL............. Cpf1_Wa;.......................................M.......k...k.........................k.. Cpf1_Cr;ABCC8acagcggtgtgaccaagatatggaaGAGGGAGAGGGAGGCGAGGGAGAGGGAGGCaaaggccacggagggcgagaagtcg SEQ_DUP;.........................111111111111111222222222222222......................... DUP_NUM;.aa.a.a.....aC...aaB.a.aaa.a.aaa.aa.a.aaa.a.aaa.aA....aA....aa.aaa.a.aB.A..aA... Cas9_Wa;A.....A..A.......aa.........................A..............A.....Aa.a......A.... Cas9_Cr;..........................................M..................................... Cpf1_Wa;LLN....L.....L........L.....L.....k........L.........N.............LN..N........ Cpf1_Cr;MMABtctctccagccctcttaccgtctctCGGCCCGGCGGCACACGGCCCGGCGGCACAcggcccggcagaaatgcagcgccga SEQ_DUP;.........................111111111111111222222222222222......................... DUP_NUM;....A..........A......aA...aa.aA.....aA...aa.aA.....aA...aA..aB...A..a.A..a.A..a Cas9_Wa;a.aaa.a.a.aa..Aaa.a...Aa..a.a.a..Aaa..A..A.a.a..Aaa..A..A.a.a..Aaa..A.......A..A Cas9_Cr;L.....L..N..........L.......L........N.......................................... Cpf1_Wa;.............................................k.................................L Cpf1_Cr;WFS1actggctggtcctcgccgcgaagcaGGGCCGTCGCGAGGCTGGGCCGTCGCGAGGCTgtgaagctgcttcgccggtgcttgg SEQ_DUP;.........................11111111111111112222222222222222......................... DUP_NUM;A..aA.....A..a.aB.A..aaA..A..a.a.aA..aaA..A..a.a.aA..a.aB.A..A....A..aa.A...aa.aa. Cas9_Wa;......A...A....aa.a.aa.a....A....Aa..a.a....A....Aa..a.a....A.......A..A.Ba.aa.... Cas9_Cr;.................................................................................. Cpf1_Wa;......................................LN......................L.....N............. Cpf1_Cr;CRB2cgctgggcggcctgcccctgcccttGGCGCGGCCCCGGCCCGGCGCGGCCCCGGCCCggcgcggcccctggcgcccgagagc SEQ_DUP;.........................11111111111111112222222222222222......................... DUP_NUM;aaa.aA...A.....A.....aa.a.aA....aA...aa.a.aA....aA...aa.a.aA.....aa.A...aJa.A..... Cas9_Wa;a....A.a....A..Aa..Aaaa..Aaa....A.a..Aaaa..Aaa..A.a..Aaaa..Aaa..A.a..Aaaa...A.aaa. Cas9_Cr;..........N...................................N................................... Cpf1_Wa;...............................................................................L.. Cpf1_Cr;HPS1gtcctgcaggtgctggggcaggtgtGGGCCTCCCCTGCTGGGGGCCTCCCCTGCTGGgggctgtggtcagaaagttcagccg SEQ_DUP;.........................11111111111111112222222222222222......................... DUP_NUM;.A..aa.A..aaaA..aa.a.aaA........A..aaaaA........A..aaaaA..a.aA...aB..A....A..aA..a Cas9_Wa;.a.....aa..A.....A..J..A........JAa.aaaa..A......Aa.aaaa..A......A......a.......Ba Cas9_Cr;....................N.................................L...............L........... Cpf1_Wa;.................................................k................................ Cpf1_Cr;DNAAF2ggtgaccccggagcccgcagccccaGCCACGCAGGTATCGTGCCACGCAGGTATCGTggcctccgtcctccgcgcgactcct SEQ_DUP;.........................11111111111111112222222222222222......................... DUP_NUM;.....aa.A...A..A.....A....A..aA....a.A....A..aA....a.aA.....A......a.a.a.......aHa Cas9_Wa;..........Aaaa....Aaa.a..Aaaa..Aa.a.a.....Ca.J.Aa.a.a.....Ca.J..Aa.aa..aa.aa.a.a.. Cas9_Cr;.......L...................................................M...............M...... Cpf1_Wa;.................................................................................. Cpf1_Cr;RMRPcctaggatacaggccttcagcacgaACCACGTCCTCAGCTTCACCACGTCCTCAGCTTCacagagtagtattttatagccctaa SEQ_DUP;.........................1111111111111111122222222222222222......................... DUP_NUM;aaC....aA...J..A...aB.....A......A.........A......A......Ha.A..A.........A.......aB. Cas9_Wa;...J.aa.......A...Aa.Ba..A.a...Aa.a..aa.a..A.Ba.aa.a..aa.a..A.Ba.a............J....A Cas9_Cr;......................................NL......................NL...............NL... Cpf1_Wa;..LN.................................................M.......k...k.................. Cpf1_Cr;RMRPgatacaggccttcagcacgaaccacGTCCTCAGCTTCACAGAGTCCTCAGCTTCACAGAgtagtattttatagccctaaagaaa SEQ_DUP;.........................1111111111111111122222222222222222......................... DUP_NUM;..aA...J..A...aB.....A......A......Ga.A......A......Ga.A..A.........A.......aB....a. Cas9_Wa;aa.......A...Aa.Ba..A.a...Aa.a..aa.a..A.Ba.a.....aa.a..A.Ba.a............J....Aaa... Cas9_Cr;.................................NL......................NL...............NL........ Cpf1_Wa;................................................M.......k...k....................... Cpf1_Cr;PALB2caagacagactgagtctttcaaatgAGCAAGTTGGGGTGTGCAGCAAGTTGGGGTGTGCagcaagttcgtccagcaacttctgt SEQ_DUP;.........................1111111111111111122222222222222222......................... DUP_NUM;...a..Ha.A..........a.A...A..aaaa.a.A..A...A..aaaa.a.A..A...A...A....A........A..aC. Cas9_Wa;.a...A....A...A.....a..Ba.......A.............AJ.a.............AJ.a....Ba..aa..A..A. Cas9_Cr;......N...k....M........................kL............N................N............ Cpf1_Wa;.......N................N................N............N....................M........ Cpf1_Cr;PNKPcgcggctcgcggcgtctgggtttgtGTTGTCGATGGCGACCCGTTGTCGATGGCGACCCgtttcccttgcttcagggctgtctc SEQ_DUP;.........................1111111111111111122222222222222222......................... DUP_NUM;A...a.aa.A..HaaA...a.A..A..aC.aa.a....A..A..aC.aa.a....A........A.....aaA..A.J...... Cas9_Wa;Aa...A.a..A.a.a..A..a.............Ja.....A..AaaH....a.....A..AaaH..Baaa...A.Ba....A. Cas9_Cr;............................................k....N................N................. Cpf1_Wa;.................................................................................... Cpf1_Cr;F12atgaagcctaggggacaccggggtcGGAGGCGCCGCCTGGGTTGGAGGCGCCGCCTGGGTTggggtctggcactgtgccaggtcgc SEQ_DUP;.........................111111111111111111222222222222222222......................... DUP_NUM;.A....aaaa.....aaaA..aa.aa.A..A..HaaA..aa.aa.A..A..HaaA..aaaA.J.aA....a.A...aA..a..A.. Cas9_Wa;....A......Aa.......A.aa.....a.....A.aa.aa...........A.aa.aa...........a...A.a....Aa.J Cas9_Cr;................................................................N.................N... Cpf1_Wa;....L.................L............................................................... Cpf1_Cr;YARS2gagcttctttaaacaactcttttaaCTCCTGATCAGACATGACCTCCTGATCAGACATGACctccagtgcatctatgctactgtga SEQ_DUP;.........................111111111111111111222222222222222222......................... DUP_NUM;..........................aC...a....a.......aC...a....a.......a.A.......A.....a.aC.... Cas9_Wa;........A.Ba......A..A.aH.....A.aa...Ca...A....Aa.aa...Ca...A....Aa.aa....A.Ca....A..A Cas9_Cr;............NL.............N...k............k......................................... Cpf1_Wa;..N.............................................................M.k..................L Cpf1_Cr;LDLRcagctggcgctgtgatggtggccccGACTGCAAGGACAAATCTGACTGCAAGGACAAATCTgacgaggaaaactgcggtatgggcg SEQ_DUP;.........................111111111111111111222222222222222222......................... DUP_NUM;.aa.A..a.aC.aa.aA....a...A...aa........a...A...aa........a..a.aaB.....a.aA...aaa.aaaA. Cas9_Wa;aHa.aa..A...A.a...........Aaaa..A..A.....A...Ca..IA..A.....A...Ca..IA........A..A..... Cas9_Cr;.................L....L............................................................... Cpf1_Wa;..........N.....k...........N.....k...........Lk...................................... Cpf1_Cr;KTTNtcgggagatgggaccgtcctgcaggACCGACCACATCTGCAGAACCGACCACATCTGCAGAacctctgcgtggagagcgaggattc SEQ_DUP;.........................111111111111111111222222222222222222......................... DUP_NUM;a.aC.aaa...A....A..aa...a.........A..aB...a.........A..aB......a.a.aa.a.a.aHaaC....a.A Cas9_Wa;.Ca.aIaH..........Aa..aa..A....Aa..Aa.a.Ca..A....Aa..Aa.a.Ca..A....Aa.a..A........A... Cas9_Cr;...................................................................................... Cpf1_Wa;.....................N.................N..........L......................N............ Cpf1_Cr;MSH6ttgctaatctcccagaggaagttatTCAAAAGGGACATAGAAAATCAAAAGGGACATAGAAAAgcaagagaatttgagaagatgaatc SEQ_DUP;.........................11111111111111111112222222222222222222......................... DUP_NUM;..........a.aaB.A..........aaa.....aB.........aaa.....aB...A...aHaB....a.aB.aCHaB....A.. Cas9_Wa;....A...A...Ca.aaa............BaH.......A........Ca..I.....A.........A.................. Cas9_Cr;..M.....k..............N.........L..........N..NL....................................... Cpf1_Wa;......k....L...M...k.....k....L...M...k.....N....N.......N....LN........................ Cpf1_Cr;BARD1cttctgcgtggaccttcaggaatttCATACTTTTCTTCCTGTTCACATACTTTTCTTCCTGTTCAcatacttttcttcgtagacatgctt SEQ_DUP;.........................1111111111111111111122222222222222222222......................... DUP_NUM;.a.a.aa.......aaB..J................A..J................A..J..............A..a....A....... Cas9_Wa;.Aa..A.Ba..A.....Aa.Ba.......Ba...A...Ba.Baa...Ba.a...A...Ba.Baa...Ba.a...A...Ba.Ba.....A. Cas9_Cr;.......N......N.........N............NL.......kL.......k..NL....NL.........k..NL....NL.... Cpf1_Wa;.....M...................M...................M..................................k.......N. Cpf1_Cr;MRE11ActgcacctacctttgatctgtctttGAAGTGGTAGGAAAAATGTCGAAGTGGTAGGAAAAATGTCttcttccacatctgattcatctacc SEQ_DUP;.........................1111111111111111111122222222222222222222......................... DUP_NUM;..........aC...A.....aB.a.aA..aaB.....A..aB.a.aA..aaB.....A............I..aC.............. Cas9_Wa;..Ca.aI.A.aa..Aa.....Ca...a......................a...................a.Ba.Baa.a.Ca....BaHC Cas9_Cr;.....k....k........................k..........k.........................................N. Cpf1_Wa;....LN........Lk........LN........Lk...................................N..........N..Lk... Cpf1_Cr;PYGLctccacgcccacgatgccgcggatgCTGATCTGCCGCCGCTTCTCCTGATCTGCCGCCGCTTCTCctggtccgtcaggggcttcgccatg SEQ_DUP;.........................1111111111111111111122222222222222222222......................... DUP_NUM;..A.....aC.A..aHaaC.A..aC...A..A..A........aC...A..A..A........aA...A...aaaA....A....aA..a Cas9_Wa;.a..Ba.aa.a.aaa.a....Aa.a.....A...Ca..Aa.aa.a.Ba.aa...Ca..Aa.aa.a.Ba.aa....aa..a.....A.Ba. Cas9_Cr;...N..NL.............N...L.....................................N...................N...... Cpf1_Wa;.......................................................................................... Cpf1_Cr;SLC34A1acccggtggccgggctggtggtgggGATCCTGGTGACCGTGCTGGTGATCCTGGTGACCGTGCTGGTgcagagctccagcacctccacatcc SEQ_DUP;.........................111111111111111111111222222222222222222222......................... DUP_NUM;aa.aA..aaA..aa.aa.aaaaC....aa.a...a.A..aa.aC.....aa.a...a.A..aa.A..a.A..J..A................ Cas9_Wa;..aa..AaaH....Aa...A............CaaH.....Aa...A..J...CaaH.....Aa...A..J..A..J.A.aa..A.aa.aa. Cas9_Cr;....................L....................................................................... Cpf1_Wa;...L........................................................................................ Cpf1_Cr;SLC25A13actccgctgtaagtggtttggccagCCCGGGCAGCCACCTGTAATCTCCCCGGGCAGCCACCTGTAATCTCgtcttgataacatcagcaggggtca SEQ_DUP;.........................1111111111111111111111122222222222222222222222......................... DUP_NUM;.A..A...a.aA...aA...A...aaA..A......A..........aaA..A......A.......A....aC........A..aaaA....... Cas9_Wa;a..Ca.a.aa.a..............Aa..Aaa...A..Aa.aa.....Ca.aaaa...A..Aa.aa.....Ca.aI.a.......A.Ca..A... Cas9_Cr;..M...k.............M.....L.............k.............................L......................... Cpf1_Wa;.....................N......................N...........M.N..................k..............M... Cpf1_Cr;PKP2cggccgcctggccgacagtcaagtgCGCTCTCCTCCCGCTGGAATCCACGCTCTCCTCCCGCTGGAATCCAcggcgacactgggcccagcttccct SEQ_DUP;.........................1111111111111111111111122222222222222222222222......................... DUP_NUM;.A...aA..a...A....a.a.A..........A..aaB......A..........A..aaB......aa.a.....aaA....A.........a. Cas9_Wa;aaa..A..Aa.aa...Aa..A...a.....A.aJa.aa.aaa.a.....CaaHa.a.a.aa.aaa.a.....CaaHa..A..A.a....Aaa..A. Cas9_Cr;.......NLL....NLL........................................L..........L...........L..........L.... Cpf1_Wa;...................LLN....................LLN...............................................LLN. Cpf1_Cr;AMHgccgacgggccgtgcgcgctgcgcgAGCTCAGCGTAGACCTCCGCGCCAGCTCAGCGTAGACCTCCGCGCCgagcgctccgtactcatccccgaga SEQ_DUP;.........................1111111111111111111111122222222222222222222222......................... DUP_NUM;..aaA..a.a.a.A..a.a.a.A....a.A..a......a.A...A....a.A..a......a.A..a.a.A.J..A...........a.a..... Cas9_Wa;aa.aa.aa..A...Aa...A.aJa..A.a...A.a..A.....Aa.aa.a.aa..A.a..A.....Aa.aa.a.aa...a.a.aa...a.ahCaaa Cas9_Cr;................................................................L......................L........ Cpf1_Wa;......................................................................................N..N...... Cpf1_Cr;HPS4gatcatggccagacaagcatccgttCTCCTTCCTGCCATCTGGACAAGCCTCCTTCCTGCCATCTGGACAAGCttcgtcaggggatgtgggatctggg SEQ_DUP;.........................111111111111111111111111222222222222222222222222......................... DUP_NUM;..aA...a....A.....A...........A......aa....A..........A......aa....A....A...aaaaC.a.aaaC...aaaa.aA Cas9_Wa;a.aa...Ca....Aa...A...A.CaaH.Ba.aa.Baa..Aa.Ca....A...Aa.aa.Baa..Aa.Ca....A...A.Ba..a.............C Cas9_Cr;....................L......................L..N.....NL......................NL..................NL Cpf1_Wa;....................L...N...................L...N............L......L......................L...... Cpf1_Cr;ASLtgcccctggcttcccacagccacgcCGTGGCACTGACCCGAGACTCTGAGCGTGGCACTGACCCGAGACTCTGAGcggctgctggaggtgcggaagcgga SEQ_DUP;.........................11111111111111111111111112222222222222222222222222......................... DUP_NUM;...aA.........A....A..a.aA....a....a.a.....a.a.a.aA....a....a.a.....a.a.aA..A..aa.aa.a.aaB.aHaaC.... Cas9_Wa;aaHaa..Aaaa...A.Baaa.a..Aa.a.aa....A.a...AaaH...A.aH...A....A.a...AaaH...A.aH...A..A..A........A..J. Cas9_Cr;.................................NLL................................................................ Cpf1_Wa;..............................................................L.......LLN...L....N...............L.. Cpf1_Cr;EPG5tcagagcttggtcaggggtgaaagcAGAGTTTATCACCAATTCCCCTTCAATAAGAGTTTATCACCAATTCCCCTTCAATAactctccggagctgggagtcctctt SEQ_DUP;.........................11111111111111111111111111112222222222222222222222222222......................... DUP_NUM;.A...aA...aaaa.aB..A.Ha.A........................Ha.A...............................aa.A..aaa.A........... Cas9_Wa;..Aa.ba....A.....a...........A........Ca.aa...Baaaa.Ba............Ca.aa...Baaaa.Ba.....A.aHaa....A.......a Cas9_Cr;...............k......NL......N......................k.M.......NLL...NL..........k.M.......NLL...NL....... Cpf1_Wa;.................N..........N..N.............N..........N..N.......L......L...........k.....L............. Cpf1_Cr;ABCC8ttgggcacaagaagaaaaaccacatGAGCTGATTGGTGTCGATGGCAACCAGATTAGAGCTGATTGGTGTCGATGGCAACCAGATTAcagatctgtccagcagtcatttctc SEQ_DUP;.........................11111111111111111111111111111112222222222222222222222222222222......................... DUP_NUM;A.....aB.aB..........a.a..aC..aa.a..aC.aA...I..aC...a.a..aC..aa.A..aC.aA...I..aC.....aC...A....A..A............. Cas9_Wa;..........A.a...........Aa.a.....A..........a.J...A..Aa.........A..........a.J...a..Aa......a...ca...aa..A...a.. Cas9_Cr;.......................k...............................N....................N.........N....................N.... Cpf1_Wa;.........................N..............................N......................................L........L.....L. Cpf1_Cr;LAMB3cttgccttcggtgtggtcccggcaaTTGTCACACACACCTCCATATGCCCCCTGGCTGGCGGCAAACACAGCGGGGTCAAAGTGACATGTCTCTGAGTGCCCTTGTCACACACACCTCCATATGCCCCCTGGCTGGCGGCAAACACAGCGGGGTCAAAGTGACATGTCTCTGAGTGCCCattgcagtcgcaccctggaaaaaga SEQ_DUP;.........................1111111111111111111111111111111111111111111111111111111111111111111111111111122222222222222222222222222222222222222222222222222222222222222222222222222222......................... DUP_NUM;.....aa.a.aA....aA.....a.J.J..............A......aA..aa.aAJ.......a.aaaA.....a.a....A....Ha.a.A.....a.J.J..............A......aA..aa.aAJ.......a.aaaA.....a.a....A....Ha.a.a......A..a..A......aaB....aHa.A. Cas9_Wa;a...Ba...Aa.Ba.......Jaaa..A......a.a.a.a.aa.aa.....Aaaaa...A...A..A...A.a..A.....a.......A....a.a......AaaJ...a.a.a.a.aa.aa.....Aaaaa...A...A..A...A.a..A.....a.......A....a.a......AaaJ...A...a.a.aaaH.... Cas9_Cr;.....................N..N....NL.........L.......N..............L...M.........................................................N..............L...M..........................................................N Cpf1_Wa;..n..................M....................k..............k........................................M....................k..............K.......................................LLk........................... Cpf1_Cr;

The +/−1 base differences in shifts between Watson and Crick tracks isso that cleavage positions are to the immediate left of the indicatedbase in both cases (which wouldn't be an issue if we were labelling thespaces between bases rather than the bases themselves).

The Cpf1 cleavage sites are staggered on the two strands, leaving anoverhang in the double-stranded break, not indicated in these schematicsThe cleavage sites are labeled according to the Legend column in thetable of PAM sequences below, Table 9 with an upper-case letter is it'sthe only matching PAM sequence, and a lower-case letter if it's thefirst of more-then-one matching PAM sequence.

Motifs are scanned for in flanking regions of size 50 and the tableincludes flanking regions of size 25, so cleavage sites should be showneven if the PAM site itself does not fall within the displayed sequence(as the distance between the cleavage site and the furthest position inthe PAM site is no more than 25 bases). The above tracks, from top tobottom are shown for specific genes: See, Table 6

The variants identified in Table 6 with insertion lengths between 2 and40 were then prioritized for therapeutic applications where thefollowing microduplications were identified. See, Table 7. The headingsof Table 7 are as follows:

Sequence ID: Arbitrary number assigned to each sequence.

-   -   VARIANT: of the form CHR-POS-REF-ALT, where CHR is the        chromosome and POS is the start position of the reference (REF)        allele in GRCh37, and ALT is the alternate allele; variants have        been left-normalized with vt. genome.sph.umich.edu/wiki/Vt.    -   INSERT_LENGTH: length in nucleotides of the inserted sequence in        the variant. (This is one less than the number of characters in        ALT, as the first character of ALT is the REF base within the        genome.)    -   ALLELE_ID: allele ID from the ClinVar VCF (version        clinvar_20180225. vcf.gz)    -   GENE_INFO: of the form SYMBOL:ENTREZID, from the ClinVar VCF.        Note: some variants have more than one gene listed in GENEINFO,        and this column just shows the first of them, in the interest of        space.    -   CLNDN: the associated disease name from the ClinVar vcf; if        there is more than one disease listed in the vcf just the first        is shown here in the interest of space.    -   MAX_AF: allele frequency for the variant from the gnomAD genomes        or exomes (version 2.0.2), whichever one is larger.    -   Microduplication Sequences: The information on potential CRISPR        cut-sites that shows the duplicated sequence, the two copies are        enclosed by square brackets and separated by a vertical bar,        with 5 flanking bases on either side. A base is shown in lower        case if there is a predicted CRISPR cleavage site immediately to        the left of the base.

TABLE 7 Preferred Microduplication Sequences For Clinical ApplicationVARIANT NNNNN[Duplication 1|Duplication 2]NNNNN Seq ID (CHR-POS-REF-ALT)INSERT_LENGTH ALLELE_ID GENE_INFO CLNDN MAX_AFlowercase = Cas9/Cpf1 cut site Seq. 4-3494833-A- 4 16312 DOK7:Congenital 0.0011653 GcTcA[gcCt|gCct]gCcaG ID. AGCCT 285489 myasthenic 1syndrome Seq. 9-126135887-T- 16 178866 CRB2: Focal 0.0009596cccTt[ggcGCGGcccCGgccc|Ggcgcg ID. TGGCGCGGCCCC 286204 segmentalGcccCGgccc]Ggcgc 2 GGCCC glomerulosclerosis 9 Seq. 15-72638920-0- 418928 HEXA: Tay- 0.0008041 catAg[gaTA|Gata]tACgG ID. GGATA 3073 Sachs 3disease Seq. 2-1481219-A- 4 421275 TPO: not 0.0006493GGaga[CggC|CgGc]cgcgc ID. ACGGC 7173 provided 4 Seq. 19-47983175-G- 18106552 KPTN: Mental 0.0005026 gcagg[AcCGACcaCatctgcaga|AcCG ID.GACCGACCACAT 11133 retardation, ACcaCatctgCagA]AccTc 5 CTGCAGA autosomalrecessive 41 Seq. 19-50365057-T- 17 19886 PNKP: Early 0.0002277ttTGt[GTtgTcgAtggCGaCCc|GTtgtcg ID. TGTTGTCGATGG 11284 infantileatGgCGaCCc]GTttc 6 CGACCC epileptic encephalopathy 10 Seq.11-126144895-G- 4 101651 FOXREDI: Mitochondrial 0.0002261agagg[gAgt|gaGT]GGctT ID. GGAGT 55572 complex I 7 deficiency Seq.2-38298287-T- 10 79358 CYP1B1: Glaucoma, 0.0002156GcaGt[ggTGgCatGa|gGTGgcatgA]g ID. TGGTGGCATGA 1545 congenital GaAt 8Seq. 1-158651385-G- 3 27886 SPTA1: Elliptocytosis 0.0001939cGCag[cAa|Caa]CTggt ID. GCAA 6708 2 9 Seq. 6-158535858-A- 4 423256SERAC1: 3- 0.0001666 aGaga[AgaT|agat]aGCGA ID. AAGAT 84947methylglutaconic 10 aciduria with deafness, encephalopathy, and Leigh-like syndrome Seq. 9-35658024-A- 15 29250 RMRP: Metaphyseal 0.0001567cctCA[gcttcAcAgaGtAGt|GCTtcAca ID. AGCTTCACAGAG 6023 chondrodysplasia,gagtAGT]ATTTt 11 TAGT McKusick type Seq. 19-7620610-C- 4 21646 PNPLA6:Laurence- 0.0001307 gcAGc[gccA|gCca]Gccgc ID. CGCCA 10908 Moon 12syndrome Seq. 1-26126724-G- 10 190596 SELENON: Eichsfeld 0.0001207ccaTG[ggcCGGgccC|ggccgGgccC]gg ID. GGGCCGGGCCC 57190 type ccg 13congenital muscular dystrophy Seq. 19-47984017-C- 2 264742 KPTN: not0.0001056 cTaCc[tA|ta]ctggt ID. CTA 11133 provided 14 Seq. 6-1612016-C-6 136584 FOXC1: not 0.0001046 Gagtc[AcGGcg|acggcg]gcggc ID. CACGGCG 2296provided 15 Seq. 7-107412534-C- 3 70627 SLC26A3: Congenital 0.0000969gatcC[tga|tga]tAAAT ID. CTGA 1811 secretory 16 diarrhea, chloride typeSeq. 5-176942945-T- 4 207209 DDX41: Acute 0.0000853GGGGT[CaTC|CaTc]atacg ID. TCATC 51428 myeloid 17 leukemia Seq.7-95751240-G- 23 21042 SLC25A13: Citrullinemia 0.0000853gcCAg[cccGggcaGCCaCCtgTaatCTc| ID. GCCCGGGCAGCC 10165 type IIcccGggcagCcaCCtgTaatCTc]GtcTt 18 ACCTGTAATCTC Seq. 17-37821635-G- 8441954 TCAP: not 0.0000813 agcTg[cGagGtGt|cGaGGtgt]cgGag ID. GCGAGGTGT8557 provided 19 Seq. 9-35658027-T- 10 29249 RMRP: Metaphyseal 0.0000752CAgct[tcAcAgaGtA|tcAcagaGtA]GT ID. TTCACAGAGTA 6023 chondrodysplasia,atT 20 McKusick type Seq. 17-56283862-G- 4 71256 MKS1: Joubert 0.0000732gAcAG[TgcC|TgCc]tGtgg ID. GTGCC 54903 syndrome 21 Seq. 17-48245341-A- 5467925 SGCA: Limb- 0.000069 TACCa[cTegg|cTegG]cagAg ID. ACTCGG 6442girdle 22 muscular dystrophy, type 2D Seq. 14-50100653-A- 16 205407DNAAF2: Kartagener 0.000065 cccCA[gccacgcaGgtatCGT|GccAcgca ID.AGCCACGCAGGT 55172 syndrome GgtatcGT]Ggcct 23 ATCGT Seq. 3-121514389-T-4 393234 IQCB1: Nephron 0.0000648 TGAat[CAAg|caag]catGC ID. TCAAG 9657ophthisis 24 Seq. 20-3199224-A- 8 214745 ITPA: Epileptic 0.0000647tTgcA[cTCageAc|cTcAgcac]cGgGg ID. ACTCAGCAC 3704 encephalopathy, 25early infantile, 35 Seq. 10-100183554-T- 16 20316 HPS1: Hermansky-0.0000647 ggtgT[GGGCCTCccctgctgG|GgGCC ID. TGGGCCTCCCCT 3257 PudlakTCccctgctgG]GgGct 26 GCTGG syndrome 1 Seq. 6-135754331-A- 2 214240 AHI1:Joubert 0.0000646 TGtaa[AC|AC]aaaag ID. AAC 54806 syndrome 3 27 Seq.1-216498866-G- 4 57777 USH2A: Retinitis 0.0000646 gcggg[tggC|TgGC]tgccaID. GTGGC 7399 pigmentosa 28 Seq. 18-21114427-C- 4 410327 NPC1: Niemann-0.0000646 tgGgC[AAgT|aAGT]GCCTC ID. CAAGT 4864 Pick 29 disease type C1Seq. 9-35658027-T- 7 264540 RMRP: not 0.0000646CAgct[tcacAga|tcAcaga]GtAGT ID. TTCACAGA 6023 provided 30 Seq.6-129571327-A- 8 46903 LAMA2: Merosin 0.0000646aCAgA[acGTGTtC|aCGtgttC]tCCag ID. AACGTGTTC 3908 deficient 31 congenitalmuscular dystrophy Seq. 11-94169012-T- 20 150739 MRE11: Hereditary0.0000646 TctTt[gaaGTggtAggAAaAAtgTc|Gaa ID. TGAAGTGGTAGG 4361 cancer-GtggtAGGAAaAATGTC]TTCTt 32 AAAAATGTC predisposing syndrome Seq.2-215595181-T- 20 133182 BARD1: Hereditary 0.0000609aaTTt[cATActTTTcTtcctGttcA|cataC ID. TCATACTTTTCTT 580 cancer-tTTTcttCctGttca]cAtaC 33 CCTGITCA predisposing syndrome Seq.17-33434458-T-TTA 2 242729 RADS1D: Hereditary 0.0000569caGTt[tA|TA]tCAag ID. 5892 cancer- 34 predisposing syndrome Seq.19-1399807-T- 13 23341 GAMT: Deficiency 0.0000557Gccgt[gggGccCAgtccc|GgggcCCAGt ID. TGGGGCCCAGTC 2593 of ccc]GGagc 35 CCguanidinoacetate methyltransferase Seq. 11-95560975-T- 11 39648 CEP57:Mosaic 0.0000528 GTggt[AGCcAATgtTC|AGCcaAtgttc] ID. TAGCCAATGTTC 9702variegated AgCtt 36 aneuploidy syndrome 2 Seq. 6-112390619-T- 2 21424WISP3: Progressive 0.0000488 CAAgT[aC|ac]Tcaga ID. TAC 8838pseudorheumatoid 37 dysplasia Seq. 2-241808397-G- 2 200432 AGXT: Primary0.0000411 cAtgg[cA|Ca]gccgg ID. GCA 189 hyperoxaluria, 38 type I Seq.7-107335062-G- 4 52676 SLC26A4: Pendred′s 0.0000407ctTgG[cAgT|CagT]CgGtc ID. GCAGT 5172 syndrome 39 Seq. 5-131705914-1- 1147395 SLC22A5: Renal 0.0000385 CCgct[acCggCtcGCc|acCggCtcGCc] ID.TACCGGCTCGCC 6584 carnitine Accat 40 transport defect Seq.22-26860623-1- 24 19168 HPS4: Hermansky- 0.0000366cCgTt[ctcCttCctGccatCtgGacAaGc|c ID. TCTCCTTCCTGCC 89781 PudlakTCcttCctGccatCtgGacAAGc]tTCgt 41 ATCTGGACAAGC syndrome 4 Seq.5-54529099-C- 5 143224 CCNO: Primary 0.0000351 Gggcc[Gggca|Gggca]gGGgGID. CGGGCA 10309 ciliary 42 dyskinesia Seq. 4-6290805-A- 16 19558 WFS1:Diabetes 0.000035 aagcA[GgGccGtCgcGAggcT|GgGcc ID. AGGGCCGTCGCG 7466mellitus GtCgcGAgGct]GtGaa 43 AGGCT AND insipidus with optic atrophy ANDdeafness Seq. 9-35658017-C- 17 29253 RMRP: Metaphyseal 0.0000327accAC[GtcctCAgcttcAcAga|GtCcTC ID. CGTCCTCAGCTT 6023 chondrodysplasia,agcTtcAcaga]GtAGT 44 CACAGA McKusick type Seq. 1-76226858-G- 13 18626ACADM: Medium- 0.0000325 acTag[ctagAaTGAGTta|ctagAaTgAG ID. GCTAGAATGAGT34 chain TTa]CcAgA 45 TA acylcoenzyme A dehydrogenase deficiency Seq.5-147466073-T- 4 406655 SPINK5: not 0.0000324 Cagat[gTgC|GTGc]acTgt ID.TGTGC 11005 provided 46 Seq. 6-31085224-G- 4 167426 CDSN: Peeling0.0000324 cagtg[aggC|Aggc]aGGgg ID. GAGGC 1041 skin 47 syndrome Seq.5-54529084-C- 5 143228 CCNO: Primary 0.0000324 Gccac[Gggct|GggcT]GggcCID. CGGGCT 10309 ciliary 48 dyskinesia Seq. 2-219646907-T- 10 264076CYP27A1: not 0.0000324 cccAT[ggctGcgcTG|gGcTGcgcTG]g ID. TGGCTGCGCTG1593 provided GcTg 49 Seq. 22-37260985-A- 10 224721 NCF4: Chronic0.0000324 GACaa[aaggAGGAtc|aaggaGgATc] ID. AAAGGAGGATC 4689granulomatous CAAgt 50 disease Seq. 1-154960775-G- 2 226515 FLAD1:Glutaric 0.0000323 ttgag[Gc|Gc]aGtGg ID. GGC 80308 aciduria, 51 type 2Seq. 2-228566952-A- 2 353889 SLC19A3: Basal 0.0000323 TCtCA[Tc|tc]AtgGaID. ATC 80704 ganglia 52 disease, biotin- responsive Seq.8-74888632-C-CGT 2 200167 TMEM70: not 0.0000323 CgGgc[gT|GT]cCtCC ID.54968 provided 53 Seq. 2-152364571-A- 4 29086 NEB: Nemaline 0.0000323gTCCA[AAac|aaAc]aGtCt ID. AAAAC 4703 myopathy 2 54 Seq. 4-15575920-C- 4214183 CC2D2A: Joubert 0.0000323 CCagc[tggT|tgGt]tcctG ID. CTGGT 57545syndrome 9 55 Seq. 6-129835627-T- 4 98901 LAMA2: Merosin 0.0000323aaGAt[cAAA|caAA]caCCg ID. TCAAA 3908 deficient 56 congenital musculardystrophy Seq. 6-158538811-C- 4 211210 SERAC1: not 0.0000323tCcac[AgaT|aGAT]ATAat ID. CAGAT 84947 provided 57 Seq. 12-123738316-T- 4211578 C12orf65: not 0.0000323 TtGtT[ATcC|ATcc]ccagg ID. TATCC 91574provided 58 Seq. 13-20763209-G- 4 186855 GJB2: Deafness, 0.0000323cCaGg[cgTT|cgTt]gcACt ID. GCGTT 2706 autosomal 59 recessive 1A Seq.13-32972540-C- 4 180697 BRCA2: Hereditary 0.0000323aAggC[ATtT|aTtT]CAGcc ID. CATTT 675 cancer- 60 predisposing syndromeSeq. 2-48033707-T- 8 94955 MSH6: Lynch 0.0000323TTgCt[aATcTCCc|aatctccc]agagg ID. TAATCTCCC 2956 syndrome 61 Seq.12-124171469-G- 8 462224 TCTN2: Meckel- 0.0000323aTCAG[ctcTGcTc|cTcTgcTc]tgcTg ID. GCTCTGCTC 79867 Gruber 62 syndromeSeq. 15-44867171-C- 10 465073 SPG11: Spastic 0.0000323agCAc[TtcCaGgtta|TtccAGgttA]GTT ID. CTTCCAGGTTA 80208 paraplegia 11, ac63 autosomal recessive Seq. 17-41246723-G- 10 70390 BRCA1: Hereditary0.0000323 tTGTG[CCacAtggcT|CCacatgGcT]c ID. GCCACATGGCT 672 cancer- CacA64 predisposing syndrome Seq. 11-76867001-T- 11 408477 MYO7A: not0.0000323 gCTcT[CcAtCtaCtcG|CcAtctaCtcg]C ID. TCCATCTACTCG 4647 providedcAga 65 Seq. 17-4804916-G- 14 468369 CHRNE: Myasthenic 0.0000323gTctg[AgaGctgeGgAgcc|aGagctGcG ID. GAGAGCTGCGGA 1145 syndrome,gAgcc]aGggc 66 GCC congenital, 4a, slow- channel Seq. 7-117188810-A- 4186745 CFTR: Cystic 0.0000294 ctgaa[agat|aGAT]ATTAA ID. AAGAT 1080fibrosis 67 Seq. 5-148407494-A- 4 244469 SH3TC2: not 0.0000244aggCa[ggCc|GgCc]agcag ID. AGGCC 79628 provided 68 Seq. 10-104590547-G- 416816 CYP17A1: Congenital 0.0000214 cttgg[ggaT|gGAT]GCCTt ID. GGGAT 1586adrenal 69 hyperplasia Seq. 20-400315-C- 7 150333 RBCK1: Polyglucosan0.0000213 AgCcc[GacgaGg|gacGagG]aGgAg ID. CGACGAGG 10616 body 70myopathy 1 with or without immunodeficiency Seq. 1-94508433-G-GAC 2359278 ABCA4: Stargardt 0.0000203 aACCg[Ac|ac]aGcTt ID. 24 disease 1 71Seq. 2-73635784-T- 4 393242 ALMS1: Alstrom 0.0000203CagAT[agTA|aGTA]TatcA ID. TAGTA 7840 syndrome 72 Seq. 2-74185326-A- 423194 DGUOK: Mitochondrial 0.0000203 aatga[TgaT|TgAT]TttTc ID. ATGAT1716 DNA- 73 depletion syndrome 3, hepatocerebral Seq. 17-4805917-A- 5422179 CHRNE: not 0.0000203 taCcA[gtgaG|gtgaG]atGAG ID. AGTGAG 1145provided 74 Seq. 5-176813493-G- 21 27972 SLC34A1: Fanconi 0.0000203gtgGG[GAtcCtGgtgacCGtgctgGT|gA ID. GGATCCTGGTGA 6569 renotubulartcCtGgtgacCGtgctGGt]gcAga 75 CCGTGCTGGT syndrome 2 Seq. 22-29115401-A- 4185622 CHEK2: Hereditary 0.000018 TGacA[tgat|tgat]GTAtT ID. ATGAT 11200cancer- 76 predisposing syndrome Seq. 2-48033769-A- 4 94970 MSH6:Hereditary 0.0000166 ATGAa[TCAg|TcaG]tcact ID. ATCAG 2956 nonpolyposis77 colon cancer Seq. 17-33903147-A- 4 358425 PEX12: Infantile 0.0000163GgatA[AggC|AGGC]aACAc ID. AAGGC 5193 Refsum's 78 disease Seq.17-41244495-T- 5 69430 BRCA1: Hereditary 0.0000163aATgt[TCTCA|tcTcA]ttTcc ID. TTCTCA 672 cancer- 79 predisposing syndromeSeq. 9-35658012-A- 17 29260 RMRP: Metaphyseal 0.0000162cacGA[accACGtcctCAgcttC|AcCacG ID. AACCACGTCCTC 6023 chondrodysplasia,tcctCagcTtc]Acaga 80 AGCTTC McKusick type Seq. 12-32902980-A- 18 414707YARS2: Mitochondrial 0.0000162 TtTaa[CtcCTgatcAGacaTGAc|CtcCt ID.ACTCCTGATCAG 51067 diseases gatcAGacaTGAc]CtCca 81 ACATGAC Seq.11-17470110-T- 31 429214 ABCC8: Persistent 0.0000162CaCat[gagCTgaTtGGtgTcgATGgCaa ID. TGAGCTGATTGG 6833 hyperinsulinemiccCaGatta|gagCTgaTtGGtgTcgATGgc 82 TGTCGATGGCAA hypoglycemia ofaacCaGAtta]CAGaT CCAGA infancy Seq. 16-1841827-C- 9 23168 IGFALS: Acid-0.0000142 AgCtc[Gcgcaggct|gcgcAggCt]GcccA ID. CGCGCAGGCT 3483 labile 83subunit deficiency Seq. 15-40268931-G- 4 414416 EIF2AK4: Familial0.0000134 tGGAG[CACT|Cact]tcggg ID. GCACT 440275 pulmonary 84 capillaryhemangiomatosis Seq. 6-38850803-T-TAG 2 456184 DNAH8: Primary 0.0000128tTgat[Ag|Ag]aCACc ID 1769 ciliary 85 dyskinesia Seq. 9-21971020-A- 3182930 CDKN2A: Hereditary 0.0000128 GggCa[GAc|gAC]GgCcc ID. AGAC 1029cancer- 86 predisposing syndrome Seq. 2-234669554-G- 4 428001 UGT1A:Crigler- 0.0000122 tCctG[cagc|cagC]ggGtg ID. GCAGC 7361 Najjar 87syndrome Seq. 1-5935033-T- 5 101577 NPHP4: not 0.0000122GgcGt[GgaGc|GgAgc]gTGTg ID. TGGAGC 261734 provided 88 Seq. 7-92134156-A-9 99009 PEX1: not 0.0000122 CtGCa[tCCacactg|tCcAcActG]cctct ID.ATCCACACTG 5189 provided 89 Seq. 11-17452431-A- 15 214503 ABCC8: not0.0000122 Tggaa[gAgggagAggGaGgc|gAgggag ID. AGAGGGAGAGG 6833 providedAggGAGgc]aAAGg 90 GAGGC Seq. 2-152354227-T- 4 448783 NEB: Nemaline0.0000106 tgtat[aAcA|AACa]CcTgt ID. TAACA 4703 myopathy 2 91 Seq.19-47258867-C- 4 267103 FKRP: Congenital 0.0000088 GgtgC[ggga|gGGa]gTtcGID. CGGGA 79147 muscular 92 dystrophy- dystroglycanopathy(with or without mental retardation) type B5 Seq. 4-995488-C- 5 26960IDUA: Mucopolysaccharidosis 0.0000088 GaTgc[CTGct|cTgct]cggaG ID. CCTGCT3425 type I 93 Seq. 5-17683I303-C- 18 390679 F12: Hereditary 0.0000087gggtc[gGaGgcGCcgcctgggtt|gGaGGc ID. CGGAGGCGCCGC 2161 angioneuroticGCcgcctgggtt]GgGgt 94 CTGGGTT edema with normal C1 esterase inhibitoractivity Seq. 2-179442173-G- 4 391746 TTN: Limb- 0.0000082caaTG[gTAT|GTAT]CcCct ID. GGTAT 7273 girdle 95 muscular dystrophy,type 2J Seq. 12-109998851-T- 15 200227 MMAB: not 0.0000082tCtcT[CggcCcgGcggCacA|CggcCcg ID. TCGGCCCGGCGG 326625 providedGcggCacA]CggcC 96 CACA Seq. 17-41244840-C- 2 46039 BRCA1: Hereditary0.0000081 TtcaC[at|aT]tCaAa ID. CAT 672 cancer- 97 predisposing syndromeSeq. 3-48508650-G- 3 19220 TREX1: Aicardi 0.0000081 gAggG[tgA|tGA]TGtcCID. GTGA 11277 Goutieres 98 syndrome 1 Seq. 2-26415259-C- 4 199995HADHA: not 0.0000081 cctcc[tgat|tgaT]aGAtg ID. CTGAT 3030 provided 99Seq. 2-148696793-C- 4 39252 ORC4: Meier- 0.0000081 GatGc[TgtT|tgTT]acTcgID. CTGTT 5000 Gorlin 100 syndrome 2 Seq. 3-48508676-1- 4 131925 TREX1:Aicardi 0.0000081 catCT[gTca|gTca]GtGgA ID. TGTCA 11277 Goutieres 101syndrome 1 Seq. 8-55537899-G- 4 21010 RP1: Retinitis 0.0000081tataG[Tgaa|tgAA]gaaaG ID. GTGAA 6101 pigmentosa 1 102 Seq.11-108119732-T- 4 264571 ATM: Ataxia- 0.0000081 tgaTt[ACag|AcaG]TGtCcID. TACAG 472 telangiectasia 103 syndrome Seq. 15-44876096-U- 10 409235SPG11: Spastic 0.0000081 ggGtg[CAGATcCTcc|cagatcCTCc]at ID. GCAGATCCTCC80208 paraplegia 11, act 104 autosomal recessive Seq. 19-55673062-U- 5404089 DNAAF3: Primary 0.0000076 gccgg[gcgtC|GcGTc]Gtagc ID. GGCGTC352909 ciliary 105 dyskinesia Seq. 19-50364929-G- 5 203579 PNKP: Early0.0000071 gCccg[CggcT|cggct]cGGgc ID. GCGGCT 11284 infantile 106epileptic encephalopathy 10 Seq. 13-32911149-A- 2 248942 BRCA2: Breast-0.0000044 gacaa[tg|tG]AGAAT ID. ATG 675 ovarian 107 cancer, familial 2Seq. 13-32912466-C- 4 66246 BRCA2: Hereditary 0.0000044tatAC[TgCt|tgCt]gCCag ID. CTGCT 675 cancer- 108 predisposing syndromeSeq. 7-65551736-C- 25 200151 ASL: not 0.0000044CacGc[cGtGgcACtgaCcCGAgactcTg ID. CCGTGGCACTGA 435 providedag|cgTGGcACTGaCcCGAgactctgaG] 109 CCCGAGACTCTG cgGCt AG Seq.17-4802524-C- 7 467870 CHRNE: Myasthenic 0.0000043gcCcC[TcaaaCa|TCAaaca]CGAgC ID. CTCAAACA 1145 syndrome, 110congenital, 4a, slow- channel Seq. 19-2251669-G- 23 23664 AMH:Persistent 0.0000043 gcgcG[AgctcAGcGtAGaCcTcCgcgcc| ID. GAGCTCAGCGTA 268mullerian AgctcAGcGtAGaCcTcCgcGcc]gagc 111 GACCTCCGCGCC duct g syndrome,type I Seq. 14-77757715-G- 2 266684 POMT2: Congenital 0.0000042aaTag[gt|gt]GGTga ID. GGT 29954 muscular 112 dystrophy-dystroglycanopathy with brain and eye anomalies, type A2 Seq.17-41242962-T- 2 69791 BRCA1: Hereditary 0.0000042 taccT[gA|ga]GTGGt ID.TGA 672 cancer-  113 predisposing syndrome Seq. 2-48026541-G-GGT 2 94667MSH6: Hereditary 0.0000041 ccCtg[gt|gt]gCaga ID. 2956 nonpolyposis 114colon cancer Seq. 5-13793822-C-CTA 2 395008 DNAH5: Primary 0.0000041CgcAC[tA|tA]tctca ID. 1767 ciliary 115 dyskinesia Seq. 7-117243689-G- 268231 CFTR: Cystic 0.0000041 cgtgG[gA|ga]gTagC ID. GGA 1080 fibrosis 116Seq. 16-23647131-T- 2 478268 PALB2: Hereditary 0.0000041GTagt[CG|CG]ccCtg ID. TCG 79728 cancer- 117 predisposing syndrome Seq.17-41245704-G- 2 249143 BRCA1: Breast- 0.0000041 aGAAG[AC|AC]TTcCt ID.GAC 672 ovarian 118 cancer, familial 1 Seq. 2-26502064-C- 4 425487HADHB: not 0.0000041 CtgGC[cgCT|cgCt]gcctT ID. CCGCT 3032 provided 119Seq. 2-48030716-T- 4 214607 MSH6: Hereditary 0.0000041TTCct[aatg|AATg]aCATT ID. TAATG 2956 nonpolyposis 120 colon cancer Seq.2-48032775-T- 4 451638 MSH6: Hereditary 0.0000041 tttgt[tgaA|TgAA]tTaaGID. TTGAA 2956 nonpolyposis 121 colon cancer Seq. 2-48033355-C- 4 419562MSH6: Hereditary 0.0000041 ACTgC[AaCa|aACa]tTtga ID. CAACA 2956 cancer-122 predisposing syndrome Seq. 2-48033448-C- 4 231582 MSH6: Hereditary0.0000041 caTTc[aTTa|atta]gTagA ID. CATTA 2956 cancer- 123 predisposingsyndrome Seq. 2-48033767-G- 4 182191 MSH6: Hereditary 0.0000041agATG[AATC|AaTc]aGtca ID. GAATC 2956 nonpolyposis 124 colon cancer Seq.2-215646137-A- 4 232427 BARD1: Hereditary 0.0000041TcTGA[cttT|ctTT]ctTAc ID. ACTTT 580 cancer- 125 predisposing syndromeSeq. 3-48626421-C- 4 411522 COL7A1: Recessive 0.0000041Caggc[tCAg|tcaG]Gggct ID. CTCAG 1294 dystrophic 126 epidermolysisbullosa Seq. 4-6302429-C- 4 211040 WFS1: not 0.0000041GTacC[Tgat|tgaT]TGacA ID. CTGAT 7466 provided 127 Seq. 9-6553398-G- 4266395 GLDC: Non- 0.0000041 agatG[gaAC|GaaC]tGGAg ID. GGAAC 2731 ketotic128 hyperglycinemia Seq. 9-36246039-C- 4 265527 GNE: Inclusion 0.0000041catgc[gaAT|gaAt]GATGC ID. CGAAT 10020 body 129 myopathy 2 Seq.19-11222244-A- 4 228160 LDLR: Familial 0.0000041 ctgga[gggT|ggGT]ggCTaID. AGGGT 3949 hypercholesterolemia 130 Seq. 11-47367805-A- 5 178182MYBPC3: Cardiomyopathy 0.0000041 tAgCA[tGccG|tGccG]cgcaG ID. ATGCCG 4607131 Seq. 1-45798772-C- 6 232268 MUTYH: Hereditary 0.0000041CctgC[atcCaT|atccat]ccggt ID. CATCCAT 4595 cancer- 132 predisposingsyndrome Seq. 5-131726404-1- 6 359651 SLC22A5: not 0.0000041ctAtt[tTGGgc|tTgggC]tTtCg ID. TTTGGGC 6584 provided 133 Seq.20-25288616-G- 7 15065 ABHD12: Polyneuropathy, 0.0000041ggATG[GCTctta|GcTcTTa]gcTtc ID. GGCTCTTA 26090 hearing 134 loss, ataxia,retinitis pigmentosa, and cataract Seq. 13-32918751-C- 8 262824 BRCA2:Breast- 0.0000041 AAAtc[aAgaaAaa|AaGAAAAA]TC ID. CAAGAAAAA 675 ovarianCTt 135 cancer, familial 2 Seq. 19-11216255-A- 8 245718 LDLR: Familial0.0000041 GGaca[AaTcTGAc|aaTctGac]gAGga ID. AAATCTGAC 3949hypercholesterolemia 136 Seq. 11-2591894-G- 11 247639 KCNQ1: Long QT0.0000041 gtacG[TGgtcCgcctc|TggTcCGCctc]tg ID. GTGGTCCGCCTC 3784syndrome 1 gTc 137 Seq. 16-23641191-G- 17 244957 PALB2: Hereditary0.0000041 aAatg[AgCAagttGgGgTGtgc|AgCAa ID. GAGCAAGTTGGG 79728 cancer-gttGgGgtGtGC]AgCAa 138 GTGTGC predisposing syndrome Seq. 19-11216242-C-18 18772 LDLR: Familial 0.0000041 GccCC[gactgcAaGgaCAAaTcT|gAct ID.CGACTGCAAGGA 3949 hypercholesterolemia gcaaGGaCAAaTct]GacgA 139 CAAATCTSeq. 2-48033727-T- 19 94960 MSH6: Hereditary 0.0000041GTTaT[tCaaaagggacaTaGaAAA|tCaa ID. TTCAAAAGGGAC 2956 nonpolyposisaagGGacaTaGaAAa]gcaaG 140 ATAGAAAA colon cancer Seq. 14-51411077-G- 201871 PYGL:31 Glycogen 0.0000041 ggatg[cTGaTctGCcgCCgcTtctc|Ctgat ID.GCTGATCTGCCG 5836 storage ctGCcgCCgcTtctc]Ctggt 141 CCGCTTCTC disease,type VI Seq. 12-33030862-G- 23 398966 PKP2: Arrhythmogenic 0.0000041aagTg[CGCTCtCctcCcgctggaatcCA| ID. GCGCTCTCCTCC 5318 rightCgctctCctcCcgctggAaTccA]cggcg 142 CGCTGGAATCCA ventricularcardiomyopathy, type 9 Seq. 18-43493697-C- 28 469361 EPG5: Absent0.0000041 Aaagc[AGAgtttATCACCaaTtcCCctt ID. CAGAGTTTATCA 57724 corpuscaaTa|aGagTTtATCaccaaTtcCCCttca 143 CCAATTCCCCTTC callosum aTa]aCtctAATA cataract immunodeficiencyAs noted earlier there are over 2000 duplications annotated aspathogenic in ClinVar that do not appear in gnomAD at all, and hence arenot listed in table 6 above, but may nonetheless be promising candidatesfor MMEJ. In particular the developers of gnomAD have “made every effortto exclude individuals with severe pediatric diseases from the gnomADdata set” (gnomad.broadinstitute.org/faq), and because of this, allelefrequencies for dominant diseases in particular may be underestimated ingnomAD, or the variants may be entirely absent. To illustrate some ofthe potential MMEJ candidates of this sort, in Table 8 below we listthose duplications of length 4-20 that satify all of the conditions fromcolumn dup2iP from Table 4 except they are absent from gnomAD, and forwhich the OMIM ID associated with the ClinVar entry in listed as havingan autosomal dominant mode of inheritance. The columns are the same asfor Table 7, although the MAX_AF column is excluded, as these variantsdo not appear in gnomAD.

TABLE 8 Additonal Microduplication SequencesAssociated with Autosomal Dominant Diseases NNNNN[Duplication 1|[Duplication 2]NNNNN Sequence INS_ ALLELE GENE lowercase = Cas9/Cpf1 IDVARIANT LEN ID INFO CLNDN cut site B1 1-149898309-T- 4 205400 SF3B4:Nager syndrome Aggat[Tggg|TgGG]agcag TTGGG 10262 B2 2-16082251-A- 4426707 MYCN:4613 Feingold TtTGa[CTCg|CtCg]ctAca ACTCG syndrome 1 B32-48033635-T- 4 94949 MSH6:2956 Hereditary agAcT[ATTa|AtTa]CGtTC TATTAnonpolyposis colon cancer B4 2-145156539-T- 4 442547 ZEB2:9839Mowat-Wilson GttAt[ggAg|GGaG]TCcaT TGGAG syndrome B5 2-145156576-A- 4101526 ZEB2:9839 Mowat-Wilson atAaa[gAgT|GAGT]Ctttt AGAGT syndrome B62-166848432-T- 4 187710 SCN1A:6323 Severe agGat[gACC|gAcc]gcGat TGACCmyoclonic epilepsy in infancy B7 2-166848788-G- 4 187727 SCN1A:6323Severe aaGGG[ACAT|aCaT]CAtca GACAT myoclonic epilepsy in infancy B82-166897853-G- 4 187812 SCN1A:6323 Severe aaaTg[ggTC|GgTC]cATCa GGGTCmyoclonic epilepsy in infancy B9 3-128200679-A- 4 213545 GATA2:2624Lymphedema, TaGta[gAgg|GAgG]CcaCa AGAGG primary, with myelodysplasia B103-128200780-G- 4 227183 GATA2:2624 Dendritic cell, tctgG[cggC|CgGC]cgactGCGGC monocyte, B lymphocyte, and natural killer lymphocyte deficiencyB11 5-36985675-C- 4 207223 NIPBL: Cornelia de cgagC[tgaA|tGaA]GCCTtCTGAA 25836 Lange syndrome 1 B12 5-37019478-C- 4 207238 NIPBL:Cornelia de AAAAc[ACTg|actg]AgacT CACTG 25836 Lange syndrome 1 B135-37045600-T- 4 428429 NIPBL: Cornelia de cAcTt[cTaA|ctAA]CaAAc TCTAA25836 Lange syndrome 1 B14 5-112173974-T- 4 453932 APC:324 FamilialAGTgt[CAgc|cagc]caTtc TCAGC adenomatous polyposis 1 B15 5-176696704-A- 4207196 NSD1:64324 Sotos TCtaa[tgAC|TGAC]taTtt ATGAC syndrome 1 B166-7578754-A- 4 456063 DSP:1832 Arrhythmogenic TtaCa[ggtt|GGTT]CtTaaAGGTT right ventricular cardiomyopathy, type 8 B17 6-42689651-T- 4 28213PRPH2:5961 Patterned cCggt[agTA|AGTA]CTtCA TAGTA dystrophy of retinalpigment epithelium B18 7-155604807-G- 4 76761 SHH:6469 Holo-gccag[Cagc|CaGC]agCAt GCAGC prosencephaly 3 B19 9-135779065-T- 4 459513TSC1:7248 Tuberous gagct[gCtg|Gctg]ctttG TGCTG sclerosis 1 B2010-76789950-G- 4 39483 KAT6B:23522 Young Simpson caACG[cCAa|ccaA]catTGGCCAA syndrome B21 10-88678975-C- 4 397998 BMPRIA:657 JuvenileAGcTc[tAtT|tatt]tgaTt CTATT polyposis syndrome B22 11-31823171-C- 4424532 PAX6:5080 Aniridia 1 CAagc[aAAg|aAag]Atgga CAAAG B2311-31823224-G- 4 485944 PAX6:5080 Aniridia 1 ttCtg[gagT|GaGt|CGCtA GGAGTB24 11-44129659-C- 4 264541 EXT2: Multiple TGaAC[tgCT|tgCt]cATgG CTGCT2132 exostoses type 2 B25 11-47372123-C- 4 248635 MYBPC3: FamilialTggCc[TcAg|tcag]cagGg CTCAG 4607 hypertrophic cardio- myopathy 4 B2612-32977056-T- 4 399705 PKP2:5318 Arrhythmogenic cTtCt[cAtc|caTc]gctttTCATC right ventricular cardiomyopathy, type 9 B27 12-114832576- 4462429 TBX5:6910 Aortic valve ctaTa[aACg|AAcg]CAGtc A-AAACG disease 2B28 13-32900728-C- 4 66682 BRCA2:675 Breast-ovarianCcaCc[ctta|ctta]GttCt CCTTA cancer, familial 2 B29 13-32903617-T- 4180558 BRCA2:675 Familial Ctcat[gatA|GATA]CtACt TGATA cancer of breastB30 13-32906470-G- 4 234629 BRCA2:675 Hereditary gAAAG(TCaA|Tcaa)tgcCaGTCAA cancer- predisposing syndrome B31 13-32906777-G- 4 183659BRCA2:675 Hereditary aGTTG[taCC|tACc]gTctt GTACC cancer- predisposingsyndrome B32 13-32907062-G- 4 261077 BRCA2:675 HereditarytcttG[cagt|caGt]AaagC GCAGT breast and ovarian cancer syndrome B3313-32910470-A- 4 261113 BRCA2:675 Breast- cTtAa(cTAg|CtAG]ctCtt ACTAGovarian cancer, familial 2 B34 13-32913932-G- 4 261296 BRCA2:675 Breast-aCtTG[TgAc|TGaC]taGct GTGAC ovarian cancer, familial 2 B3513-32914339-T- 4 66619 BRCA2:675 Hereditary GtgAT[gttA|gTTA]gttTG TGTTAcancer-predisposing syndrome B36 13-32914758-A- 4 180628 BRCA2:675Familial acTGa[gcAT|gcaT]AgtCT AGCAT cancer of breast B37 13-32914858-G-4 183912 BRCA2:675 Hereditary aaatG[gaaA|GAAa]AAaCC GGAAA cancer-predisposing syndrome B38 13-32929209-G- 4 261402 BRCA2:675 Breast-cTTTGttTCC|TTcc]aCctt GTTCC ovarian cancer, familial 2 B3913-32936669-T- 4 131698 BRCA2:675 Hereditary Tgtgt[gaCa|GaCa]ctcca TGACAbreast and ovarian cancer syndrome B40 13-32936731-G- 4 261438 BRCA2:675Breast- agaTg[gaTC|GaTc]atatg GGATC ovarian cancer, familial 2 B4113-32937507-A- 4 67186 BRCA2:675 Breast- acaga[tggg|Tggg]TGgTA ATGGGovarian cancer, familial 2 B42 13-32944599-C- 4 261474 BRCA2:675 Breast-tgacc[CTAg|ctaG]AccTT CCTAG ovarian cancer, familial 2 B4313-32950917-G- 4 131727 BRCA2:675 Hereditary ccCAG[ctta|ctTA]ccttg GCTTAcancer- predisposing syndrome B44 14-95577661-G- 4 463789 DICER1:DICER1- ttttg[ggTA|ggta]GcACT GGGTA 23405 related pleuropulmonaryblastoma cancer predisposition syndrome B45 16-2121893-C- 4 27434TSC2:7249 Tuberous TGccC[TAct|taCt]cCCtG CTACT sclerosis syndrome B4616-23637659-A- 4 180720 PALB2: Hereditary CTTta[cAAc|CaaC]cgGCt ACAAC79728 cancer- predisposing syndrome B47 17-29559831-A- 4 425126 NF1:4763Neuro- aGgca[CTgt|ctgt]AcGGT ACTGT fibromatosis, type 1 B4817-29657434-T- 4 401825 NF1:4763 Neuro- TCTCT(AtTa|aTtA]gTAAg TATTAfibromatosis, type 1 B49 17-42426621-G- 4 31049 GRN:2896 FrontotemporalgcCtG[CtgC|Ctgc]ctGGa GCTGC dementia, ubiquitin- positive B5017-59763312-T- 4 402202 BRIP1: Familial Ctggt(gaTA|gata1GATga TGATA83990 cancer of breast B51 18-48586254-T- 4 361 SMA JuvenileGagCT[tgCa|TgCa]TtccA TTGCA 59 D4:4089 polyposis syndrome B5219-11200255-A- 4 245343 LDLR:3949 Familial CTGgA[cCgT|CcgT]CgcCT ACCGThyper cholesterolemia B53 19-11216012-C- 4 245559 LDLR:3949 Familialctgcc[cggt|cgGT]GcTCa CCGGT hyper- cholesterolemia B54 19-11218164-G- 4245855 LDLR:3949 Familial gacTg[gTcA|gtCa]gATgA GGTCA hyper-cholesterolemia B55 19-11222196-A- 4 245974 LDLR:3949 FamilialatcGa[tgag[TgAG]tgtca ATGAG hyper- cholesterolemia B56 19-11222247-G- 4245999 LDLR:3949 Familial gaggG[TggC(TggC]taCaa GTGGC hyper-cholesterolemia B57 19-11224220-T- 4 228168 LDLR:3949 Familialagctt[gAcA|gAca]GAGcc TGACA hyper- cholesterolemia B58 19-11224266-G- 4246144 LDLR:3949 Familial cagaG[aCaT|ACat]cCagg GACAT hyper-cholesterolemia B59 19-11226884-C- 4 246281 LDLR:3949 FamilialTCAcc[ctAg|ctaG]gtATg CCTAG hyper- cholesterolemia B60 19-11231165-C- 4246517 LDLR:3949 Familial CatGC[tgCT|tgCt]GgccA CTGCT hyper-cholesterolemia B61 19-11233960-C- 4 390629 LDLR:3949 FamilialCtcCc[ggct|gGCt]gcctG CGGCT hyper- cholesterolemia B62 19-11240244-T- 418744 LDLR:3949 Familial ggctt[aaga|aAGA]ACaTC TAAGA hyper-cholesterolemia B63 22-24133984-G- 4 469954 SMARCB1: Rhabdoidacaag[Agat|aGAT]ACcCC GAGAT 6598 tumor predisposition syndrome 1 B6422-29121073-C- 4 222871 CHEK2: Familial tGaTC[tTCt|ttct]AtgtA CTTCT11200 cancer of breast B65 22-29130554-G- 4 471012 CHEK2: FamilialgaGAG[gACT|gaCt]ggCtG GGACT 11200 cancer of breast B66 22-29886645-A- 4227465 NEFH:4744 Charcot-Marie- CAgCa(agCc|aGCc]tccag AAGCCTooth disease, axonal, type 2CC B67 22-41572424-C- 4 247756 EP300:2033Rubinstein- ccacC[atgt|aTGT]gCAtg CATGT Tayb1 syndrome 2 B681-153800743-G- 5 423707 GATAD2B: Mental cCAGG[acATC|acAtc]AtCtc GACATC57459 retardation, autosomal dominant 18 B69 1-155317481-T- 5 440033ASH1L: MENTAL tTCat[cctTg|ccttg]tagAG TCCTTG 55870 RETARDATION,AUTOSOMAL DOMINANT 52 B70 2-166848901-A- 5 187732 SCN1A:6323 SevereCGAaA[tACTl|TAcTT]TtcTa ATACTT myoclonic epilepsy in infancy B713-39453229-G- 5 75607 RPSA:3921 Asplenia, gggaG[GtcaT|GtcAT|gccTG GGTCATisolated congenital B72 3-136162210-A- 5 431522 STAG1: STAG 1-TCcca[TaaaC|tAaac]TGTCc ATAAAC 10274 related disorder B73 3-181430665-A-5 272798 SOX2:6657 Microphthalmia CAgcA[tGatG|tGAtg]CAGGa ATGATGsyndromic 3 B74 5-86682703-T- 5 239865 RASA1:5921 CapillaryCATGt[tTTTA|TttTa]gATgA TTTTTA malformation- arteriovenous malformationB75 8-116617181-A- 5 432085 TRPS1:7227 Trichorhino-tggcA[atctG|atctG|gtgtt AATCTG phalangcal dysplasia type I B7611-2797207-C- 5 442554 KCNQ1:3784 Long QT Cttac[gatGt|GaTgt]gCGgG CGATGTsyndrome 1 B77 11-31811508-A- 5 191325 PAX6:5080 Aniridia 1TgAGA[CAtat|cAtat]caGGt ACATAT B78 11-31812376-G- 5 461519 PAX6:5080Aniridia 1 tgCaG[gagtA|GagTa]tGagG GGAGTA B79 11-64577297-G- 5 398421MEN1:4221 Multiple tcTgg[gcGgt|GcgGt]Gaagc GGCGGT endocrine neoplasia,type 1 B80 13-32900712-T- 5 261019 BRCA2:675 Breast-tCtTT[agctA|AGctA]CAcCA TAGCTA ovarian cancer, familial 2 B8113-32932021-T- 5 261432 BRCA2:675 Breast- TGGCt|cataC|cATAc]cc TCATACovarian TCc cancer, familial 2 B82 15-48718033-A- 5 400146 FBN1:2200Marfan aaAca[tCGtg|tcGtG]AataA ATCGTG syndrome B83 17-29653013-T- 5467484 NF1:4763 Neuro- Tgctt[ACgaC|acGac]AaCgt TACGAC fibromatosis,type 1 B84 17-48264062-G- 5 413977 COL1A1: OsteogenesistGcgG[cTgcC|cTGcc]ctctg GCTGCC 1277 imperfecta type I B85 19-11216237-G-5 245693 LDLR:3949 Familial tggTG|GccCC|Gcccc]gactg GGCCCC hyper-cholesterolemia B86 19-11224008-T- 5 246065 LDLR:3949 FamilialAcgCTIGgacC|Ggacc]ggagc TGGACC hyper- cholesterolemia B87 19-11230868-C-5 246422 LDLR:3949 Familial TCcCc[agagg|agAGG]atATG CAGAGG hypercholesterolemia B88 19-11230879-G- 5 246429 LDLR:3949 FamilialtATGG(TTCTC|Ttctc]TtcCa GTTCTC hyper cholesterolemia B89 19-11233887-C-5 390628 LDLR:3949 Familial CcaCc[gtcag|gtcAG]GCtaA CGTCAG hypercholesterolemia B90 19-13136144-A- 5 205791 NFIX:4784 SotosgGgcA[AGatC|AGatc]cggcg AAGATC syndrome 2 B91 20-62044881-T- 5 361908KCNQ2:3785 Benign Gtcgt[AGggc|AGgGc]cgCAg TAGGGC familial neonatalseizures 1 B92 2-239757079-G- 6 204358 TWIST2: Barber-SaygCgCg[AgCgcc|Agcgcc]AgCgc GAGCGCC 117581 syndrome B93 9-140056954-C- 6384406 GRIN1:2902 Mental aactc[Cggcat|CGgcat]cgggg CCGGCAT retardation,autosomal dominant 8 B94 10-102510456- 6 28839 PAX2:5076 Renal colobomaTacTa[CgagAc|cgAgac]CgGCa A-ACGAGAC syndrome B95 3-123383092-T- 7 259621MYLK:4638 Visceral myopathy gtgCT[CGCtttc|cgCtttc]cTGga TCGCTTTC B966-7580155-G- 7 197069 DSP:1832 ArrhythmogeniccaaGglgaAaaTc|gaAaatc]gAgat GGAAAATC right ventricular cardiomyopathy,type 8 B97 13-32937340-C- 7 67137 BRCA2:675 Breast-GAagC[agAAgaT|agaAgat]cGGCt CAGAAGAT ovarian cancer, familial 2 B9813-32968822- 7 262861 BRCA2:675 Breast-ovarianCATtc[taGgact|TAGgaCT]Tgccc C-CTAGGACT cancer, familial 2 B9916-2142114-T- 7 442559 PKD1:5310 Polycystic aAcgt[CGtaatC|CGTaatc]gCtggTCGTAATC kidney disease, adult type B100 17-7578448-G- 7 27419 TP53:7157Osteosarcoma gATGG[ccATGgc|ccAtggc]GCgga GCCATGGC B101 17-48273539-C- 7414018 COL1A1: Osteogenesis gTagC[ACCAtCa|aCcaTCa]tTtcC CACCATCA 1277imperfecta type I B102 18-48575180-G- 7 36142 SMAD4:4089Juvenile polyposis gataG[TgTCtGT|Tgtctgt]GtGAA GTGTCTGT syndrome B10319-11213391-G- 7 362671 LDLR:3949 Familial aAccg[CtgcAtT|CtG GCTGCATThyper- Catt]CcTCa cholesterolemia B104 19-11216256- 7 245719 LDLR:3949Familial GacAA|aTcTGAc|aTCtgac]gAGGa A-AATCTGAC hyper- cholesterolemiaB105 19-11231163-T- 7 246518 LDLR:3949 Familial ggCat[GCtgCTg|GCtgTGCTGCTG hyper- CTg]gccAG cholesterolemia B106 22-32200157-T- 7 259349DEPDC5: Epilepsy, ggtgT[GgatttG|gGatTtG)gTgTg TGGATTTG 9681 familialfocal, with variable foci 1 B107 2-166848493-G- 8 187713 SCN1A:6323Severe aaaAg[AAAATTCC|AAa GAAAATTCC myoclonic atTCC]aacag epilepsy ininfancy B108 6-33405537-C- 8 456125 SYNGAP1: Mental TCtgc[ctgGatga|cTgCCTGGATGA 8831 retardation, GAtGa]CAtgc autosomal dominant 5 B1098-61728946-C- 8 207553 CHD7:55636 CHARGE tcaGC[tcTTatcT|TCt CTCTTATCTassociation tAtcT]TcatT B110 8-116427276- 8 432065 TRPS1:7227Trichorhino- tcAcc[GtTgtTTt|Gt C-CGTTGTTTT phalangeal TGtTTT]GTttadysplasia type I B111 11-64572225-C- 8 419786 MEN1:4221 MultipletCgcc[ccAcggct|cc CCCACGGCT endocrine AcGgct]ccTcG neoplasia, type 1B112 13-32914325-A- 8 261315 BRCA2:675 Breast- tAAaa(TATCaCCt|taATATCACCT ovarian tcAcCt]tGtga cancer, familial 2 B113 13-32915019-A- 8261369 BRCA2:675 Hereditary GAGAA[cattCaTg|ca ACATTCATG breast andttCaTG]ttttG ovarian cancer syndrome B114 16-23619228-C- 8 465434 PALB2:Familial cancer ccCac[gctgaGag|gc CGCTGAGAG 79728 of breast TgAGaG]TCGtcB115 16-23634319-T- 8 465443 PALB2: Familial cancer ctTCT[acttGTtG|aCTACTTGTTG 79728 of breast TtgTtG]atCag B116 17-17118378-C- 8 467434FLCN: Multiple GCTtc[aaTctTat| CAATCTTAT 201163 fibro- aATcTTat]tcAggfolliculomas B117 19-1220430-A- 8 469353 STK11:6794 Peutz-JeghersgCacA[AggacatC|Ag AAGGACATC syndrome GacAtC]Aagcc B118 19-11216107-T- 8245617 LDLR:3949 Familial aagaT[GgcTcgGa| TGGCTCGGA hyper-ggctcgGa]TgaGT cholesterolemia B119 19-11221433-A- 8 245939 LDLR:3949Familial Gccca[GcgaaGat|gc AGCGAAGAT hyper- gAaGat]GcgAa cholesterolemiaB120 19-11226844-C- 8 246266 LDLR:3949 Familial tactc[gCtGgtga|gCCGCTGGTGA hyper- tGGTga]CTGAA cholesterolemia B121 19-42794719-G- 8424692 CIC:23152 MENTAL cggcg[caAgAgAc|CaAG GCAAGAGAC RETARDATION,agac]ccgAa AUTOSOMAL DOMINANT 45 B122 7-142458427-A- 9 46926 PRSS1:5644Hereditary gAtGa[TgacAAgAt[Tga ATGACAAGAT pancreatitis caAgAt]cgttg B12310-43607601-T- 9 28980 RET:5979 Familial CggCt[ggAgTgtGa| TGGAGTGTGAmedullary GgAgTgtGatGgAgt thyroid carcinoma B124 10-43609946-T- 9 36267RET:5979 Multiple GaGct[GtGcCgcac| TGTGCCGCAC endocrine gtgCcgcAc]gGtganeoplasia, type 2a B125 17-42992481-A- 9 188181 GFAP:2670 Alexander’sccGca|gccgCagct|g AGCCGCAGCT disease CcgCAgct]cTcgC B126 5-176673777-A-10 207192 NSD1:64324 Sotos GTcaa[aagAGATtcC| AAAGAGATTCC syndrome 1aagagATtcC]AGgct B127 10-76789773-A- 10 47605 KAT6B: Young SimpsonCggGa[gCtGCAgcat| AGCTGCAGCAT 23522 syndrome GCtGCAgCAt]GCtGC B12813-48835344-C- 10 21019 ITM2B:9445 Dementia, gaAaC[TTTaaTTTGT|CTTTAATTTGT familial ttTAaTTTGT]tcTTg Danish B129 17-42328803-A- 1032797 SLC4A1:6521 Spherocytosis GgGCa[catcTgGgtG| ACATCTGGGTG type 4catctgggtG]atact B130 19-11218087-C- 10 434258 LDLR:3949 FamilialgACcC[AacaagTtCA| CAACAAGTTCA hyper- AacaaGTtCa]AGtgT cholesterolemiaB131 3-138664603-T- 11 354105 FOXL2:668 Blepharo- GagcT[GgcccgGcggC|TGGCCCGGCGGC phimosis, GgcCcgGcggc]GgcGc ptosis, and epicanthus inversusB132 8-61742962-G- 11 481218 CHD7:55636 CHARGE caaag[aAGAaaCTAtt|GAAGAAACTATT association aAGAaACtATt]aTtGA B133 8-61777949-T- 11 194201CHD7:55636 CHARGE TgAAT[aACCctCtgtc| TAACCCTCTGTC associationaacCcTctgtc]aGCTg B134 19-11216021-A- 11 434823 LDLR:3949 FamilialgcTCa[CctgtggTCCc| ACCTGTGGTCCC hyper- CctgtggtCCC]gCcag cholesterolemiaB135 19-11216250-A- 11 245708 LDLR:3949 Familial tgcAa[GGaCAAatcTG|AGGACAAATCTG hyper- GgaCAaaTctG]acgAG cholesterolemia B13610-43609939-G- 12 36265 RET:5979 Multiple gtgcG[aCGaGctGTGcC|GACGAGCTGTGCC endocrine acGagctGtGcC]gcAcg neoplasia, type 2a B13719-11227654-T- 12 424332 LDLR:3949 Familial CcccT[tctccttggcCG|TTCTCCTTGGCCG hyper- TCtccTtggccG]Tcttt cholesterolemia B13815-73617341-T- 13 361763 HCN4:10021 Sick sinus Ccctt[ggtgAgCacgctg|TGGTGAGCA syndrome 2, ggtGAgCAcgctG]accAC CGCTG autosomal dominant B13919-11216242-C- 13 245695 LDLR:3949 Familial GcccC[gactgcAaGgaCA|CGACTGCAA hyper- gACtgcAaGGaCA]AaTct GGACA cholesterolemia B14019-11233897-A- 13 246545 LDLR:3949 Familial GCtaA[AggTCAGCtccac|AAGGTCAG hyper- AggTcAGCTccAc]agccg CTCCAC cholesterolemia B1415-176637714-G- 14 394847 NSD1:64324 Beckwith- tGggG[CAgCAaAtcAAGct|GCAGCAAAT Wiedemann CAgcaaAtcAAgct]CTatT CAAGCT syndrome B1426-117996952-G- 14 480769 NUS1: MENTAL cGctG[ctgcCgcGccGcct| GCTGCCGCG116150 RETARDATION, ctgcCgcGccGcct]ctgcc CCGCCT AUTOSOMAL DOMINANT 55.WITH SEIZURES B143 13-32971081-T- 14 261554 BRCA2:675 Breast-CATAttaCtgCAtGCaAAtg| TACTGCATG ovarian cancer, AcTgCAtgCaaAtg]ATccCCAAATG familial 2 B144 19-11216249-A- 14 245706 LDLR:3949 FamilialctgcA[aGGaCAAaTctgac| AAGGACAAA hyper- aGGacAAaTctGac]gAGga TCTGACcholesterolemia B145 2-189853347-A- 15 107099 COL3A1:1281 Ehlers-DanlosggtgA[accTGGgcAagCtGG| AACCTGGG syndrome, acctGGgcAagCtGg]TcCtT CAAGCTGGtype 4 B146 18-59992633-G- 15 204405 TNFRSF11A: Paget diseasetgCTg[cTcTgcGcgctGcTc| GCTCTGCGCGC 8792 of bone 2, cTcTgcGcgCtgcTc]gccCgTGCTC early-onset B147 3-41280628-C- 16 227080 CTNNB1: ExudativegatcC[TAgctaTCgTTcttt CTAGCTAT 1499 vitreo- t|tAGCtAtCgTTctTTt]cCGTTCTTTT retinopathy 1 Actc B148 3-71019923-G- 16 102121 FOXP1: MentalcgTgg[cTGcTcTgcAtGttt GCTGCTCTG 27086 retardation t|CTgctcTgcAtGTtTT]CATGTTTT with language TAata impairment and with or without autisticfeatures B149 11-31815221-T- 16 190738 PAX6:5080 Aniridia 1GGaaT[TggtTgGTAGAcAct TTGGTTGGTAG G|tggtTggTAGacactG]g ACACTG tgCT B15016-23647409-C- 16 466297 PALB2: Hereditary TGTCC[TCttctgCtgCTtCctcttctgc 79728 cancer- Tt|TctTCtgCtgCTtCTt] tgCTTCTT predisposing TctTCsyndrome B151 19-49469932-T- 16 31527 FTL:2512 Neuro-tggGt[ggcCcgGaggcTggG TGGCCCGGAG ferritinopathy c|GgcCcgGaggcTgggc]GCTGGGC tgGgc B152 3-138664693-T- 17 19905 FOXL2:668 Blepharophimosis,Cgggt[gGgGgtGcgGcgga TGGGGGTGCGGCG ptosis, ggc|gGgGgtGcgGcggagg GAGGCand epicanthus c]gGgGg inversus B153 3-138664705-G- 17 171758 FOXL2:668Blepharophimosis, cgGcg[gaggcgGgGgtGCgG GGAGGCGGGGG ptosis, andcc|gaggcgGgGgtGCggCct TGCGGCC epicanthus ggCgg inversus B1543-138664707-A- 17 178773 FOXL2:668 Blepharophimosis,Gcgga[ggcgGgGgtGCggccg AGGCGGGGGTGC ptosis, and gtggcgGgGgtGCggCcgg]GGCCGG epicanthus CggGC inversus B155 9-140674107-A- 17 431918 EHMT1:Chromosome 9q ttcCa[CccaAagcaGCTgtac ACCCAAAGCAGC 79813 deletionT|CccaAagcaGCTgTAcT] TGTACT syndrome TcTcC B156 10-103990523- 17 459553PITX3:5309 Cataract, cCCCg[cccaGgccCtgcag G-GCCCAGGCCC posteriorGGc|ccCAggccCtgcagGG TGCAGGGC polar, 4 c]ccCAg B157 16-2134369-G- 1775949 TSC2:7249 Tuberous cccTg[agcaaGtcCAGCtcc GAGCAAGTCCAGC sclerosisTc|AgcAaGtcCAGCtcCTc] TCCTC syndrome tccCg B158 17-17118596-T- 17 247655FLCN:201163 Hereditary AaCgt[GCgGcTgcGtGGacC TGCGGCTGCGTG cancer-tc|GcgGcTgcgtGGaCCTc] GACCTC predisposing cacga syndrome B15919-11216246-T- 17 362682 LDLR:3949 Familial Cgact[gcAagGaCAAaTctGTGCAAGGACAAA hyper- Ac|gcAagGaCAAaTctGac] TCTGAC cholesterolemia gAGgaB160 11-47353677-T- 18 23642 MYBPC3:4607 HypertrophicGcCCt[GcagAcaTaGaTgCc TGCAGACATAGAT cardiomyopathy CCc|gcagacaTaGaTGCcCCGCCCCC C]gtcaa B161 18-59992620-T- 18 204404 TNFRSF11A: FamilialCtgtt[cGcGctGctgCTgcT TCGCGCTGCTGCT 8792 expansile cTg|cGcgctGctgCTgcTcTGCTCTG osteolysis gtcGcgC B162 18-59992630-G- 18 21338 TNFRSF11A:Familial tGctg[CTgcTcTgcGcgct GCTGCTCTGCGCG 8792 expansileGctc|CTgctcTgcGcgCtg CTGCTC osteolysis cTc]gccCg B163 3-138664581-G- 19171757 FOXL2:668 Blepharophimosis, cggTg[gcTGggcTggcaGgG GGCTGGGCTGGCptosis, and cTGa|gcTGggcTggcaGgGc AGGGCTGA epicanthus TGa]gcTGg inversusB164 12-12870830-G- 19 181495 CDKN1B: Multiple cCAgg[caggcgGAGcACCccGCAGGCGGAGCA 1027 endocrine Aagc|cagGcgGAGcAccccA CCCCAAGC neoplasia,agc]ccTCg type 4 B165 19-11216046-G- 19 245580 LDLR:3949 FamilialcaGtg[caACAgcTcCacCtgc GCAACAGCTCCA hyper- atC|CaAcagcTcCacctgCatCCTGCATC cholesterolemia C]ccCca B166 19-11218155-G- 19 228148 LDLR:3949Familial gaCtg[ccgGgacTGgtcagatg GCCGGGACTGGT hyper-aa|cCGGgacTGgtcagATgAA] CAGATGAA cholesterolemia CCCAt B16719-11224421-G- 19 434302 LDLR:3949 Familial tcgtg[gtGgAtCCTGttcaTgGGTGGATCCTG hyper- ggt|gTGGAtCCTgttcatgg TTCATGGGT cholesterolemiagT]GCGTA B168 2-131355422-C- 20 20229 CFC1:55997 Heterotaxy,acccc[GcgcaCcCcTgtgccc CGCGCACCCCTGT visceral, 2, aCct|gcgcACcCcTgtgcCCGCCCACCT autosomal aCct]gcgcc B169 3-138664638-T- 20 354106 FOXL2:668Blepharophimosis, Gcggt[GgggcagGcgGcGgtG TGGGGCAGGCGGC ptosis, andcggc|GgggcagGcgGcGgtGc GGTGCGGC epicanthus gGC]ggcCg inversus B1709-135778022-C- 20 397173 TSC1:7248 Tuberous ATTcc[tctcgGtCatGctGCCTCTCGGTCATGCT sclerosis 1 agCTg|tctCGGtCatGCtgC GCAGCTG agCTg]tCtGaB171 19-11216243-G- 20 434241 LDLR:3949 Familial ccCCg[actgcAaGGacaAaTGACTGCAAGGACAA hyper- cTGAc|aCtgcAaGGaCAAaT ATCTGAC cholesterolemiactGac]gAGga B172 20-10632292-C- 20 270939 JAG1:182 AlagilleGTCTC[CTtAcaGCTgCctCtg CCTTACAGCTGCCT syndrome 1 tTgT|CTtAcaGCTgCctCtgtCTGTTGT Tgt]gacagIV. Protospacer Adjacent Motif (PAM) sequences

Below are exemplary PAM sequences. addgene.org/crispr/guide/#pam-table.blog.addgene.org/xcas9-engineering-a-crispr-variant-with-pam-flexibilityTable 9.

TABLE 9 Exemplary PAM Sequences Species_ and_ Variant_ Cleav- Leg- of_Pat- Ex- age end Cas9 Side tern panded Site Width A/a xCas9_ 3′ NG[ACGT]G −3 from 2 NG start B/b xCas9_ 3′ GAA GAA −3 from 3 GAA start C/cxCas9_ 3′ GAT GAT −3 from 3 GAT start D/d SpCas9 3′ NGG [ACGT] −3 from 3GG start E/e SpCas9 3′ NGCG [ACGT] −3 from 4 VRER GCG start variant F/fSpCas9 3′ NGAG [ACGT] −3 from 4 EQR GAG start variant G/g SpCas9 3′NGAN| [ACGT]GA −3 from 4 VQR NGNG [ACGT]| start variant [ACGT]G [ACGT|GH/h SaCas9 3′ NNG [ACGT] −3 from 6 RRT [ACGT]G start [AG] [AG]T I/i NMe13′ NNNN [ACGT] −3 from 8 GATT [ACGT] start [ACGT] [ACGT] GATT J/jCjeCas9 3′ NNNN [ACGT] −3 from 8 RYAC [ACGT] start [ACGT] [ACGT] [AG][CT]AC K/k AsCpf1 5′ TTTV TTT approx 4 and [ACG] +18 LbCpf1 from end M/mAsCpf1  5′ TYCV T[CT] approx 4 and  C[ACG] +18 LbCpf1 from RR endvariant N/n AsCpf1 5′ TATV TAT approx 4 RVR  [ACG] +18 from variant endO/o FnCpf1 5′ TTV TT approx 3 [ACG] +18 from end

V. Adeno-Associated Virus Nucleic Acid Delivery Platforms

Adeno-associated virus (AAV) is a small virus which infects humans andsome other primate species. AAV is not currently known to cause disease.In many cases, AAV vectors integrate into the host cell genome, makingit useful as gene therapy delivery platform. Gene therapy vectors usingAAV can infect both dividing and quiescent cells and persist in anextrachromosomal state without integrating into the genome of the hostcell, although in the native virus some integration of virally carriedgenes into the host genome does occur. Deyle et al., (August 2009).“Adeno-associated virus vector integration”. Current Opinion inMolecular Therapeutics 11(4):442-417. These features make AAV a veryattractive candidate for creating viral vectors for gene therapy.Grieger et al., (2005). “Adeno-associated virus as a gene therapyvector: vector development, production and clinical applications”Advances in Biochemical Engineering/Biotechnology. Advances inBiochemical Engineering/Biotechnology 99:119-145 Recent human clinicaltrials using AAV for gene therapy in the retina have shown promise.Maguire et al., (May 2008) “Safety and efficacy of gene transfer forLeber's congenital amaurosis” The New England Journal of Medicine358(21): 2240-2248. AAV belongs to the genus Dependoparvovirus, which inturn belongs to the family Parvoviridae. The virus is a small (20 nm)replication-defective, nonenveloped virus.

Wild-type AAV has attracted considerable interest from gene therapyresearchers due to a number of features. Chief amongst these is thevirus's apparent lack of pathogenicity. It can also infect non-dividingcells and has the ability to stably integrate into the host cell genomeat a specific site (designated AAVS1) in the human chromosome 19. Kotinet al., (March 1990). “Site-specific integration by adeno-associatedvirus”. PNAS USA 87(6):2211-2215; and Surosky et al., (October 1997)“Adeno-associated virus Rep proteins target DNA sequences to a uniquelocus in the human genome” Journal of Virology 71(10):7951-7959. Thisfeature makes it somewhat more predictable than retroviruses, whichpresent the threat of a random insertion and of mutagenesis, which issometimes followed by development of a cancer.

The AAV genome integrates most frequently into the site mentioned, whilerandom incorporations into the genome take place with a negligiblefrequency. Development of AAVs as gene therapy vectors, however, haseliminated this integrative capacity by removal of the rep and cap fromthe DNA of the vector. The desired gene together with a promoter todrive transcription of the gene is inserted between the invertedterminal repeats (ITR) that aid in concatemer formation in the nucleusafter the single-stranded vector DNA is converted by host cell DNApolymerase complexes into double-stranded DNA. AAV-based gene therapyvectors form episomal concatemers in the host cell nucleus. Innon-dividing cells, these concatemers remain intact for the life of thehost cell. In dividing cells, AAV DNA is lost through cell division,since the episomal DNA is not replicated along with the host cell DNA.Random integration of AAV DNA into the host genome is detectable butoccurs at very low frequency. AAVs also present very low immunogenicity,seemingly restricted to generation of neutralizing antibodies, whilethey induce no clearly defined cytotoxic response. This feature, alongwith the ability to infect quiescent cells present their dominance overadenoviruses as vectors for human gene therapy. Daya et al., (October2008). “Gene therapy using adeno-associated virus vectors” ClinicalMicrobiology Reviews 21(4):583-593; Chirmule et al., (September 1999)“Immune responses to adenovirus and adeno-associated virus in humans”Gene Therapy 6(9):1574-1583; Hernandez et al., (October 1999) “Latentadeno-associated virus infection elicits humoral but not cell-mediatedimmune responses in a nonhuman primate model”. Journal of Virology73(10):8549-8558; and Ponnazhagan et al., (April 1997) “Adeno-associatedvirus 2-mediated gene transfer in vivo: organ-tropism and expression oftransduced sequences in mice” Gene 190 (1):203-210.

VI. Pharmaceutical Formulations And Compositions

The present invention further provides pharmaceutical compositions(e.g., comprising the nucleases described above). The pharmaceuticalcompositions of the present invention may be administered in a number ofways depending upon whether local or systemic treatment is desired andupon the area to be treated. Administration may be topical (includingophthalmic and to mucous membranes including vaginal and rectaldelivery), pulmonary (e.g., by inhalation or insufflation of powders oraerosols, including by nebulizer; intratracheal, intranasal, epidermaland transdermal), oral or parenteral. Parenteral administration includesintravenous, intraarterial, subcutaneous, intraperitoneal orintramuscular injection or infusion; or intracranial, e.g., intrathecalor intraventricular, administration.

Pharmaceutical compositions and formulations for topical administrationmay include transdermal patches, ointments, lotions, creams, gels,drops, suppositories, sprays, liquids and powders. Conventionalpharmaceutical carriers, aqueous, powder or oily bases, thickeners andthe like may be necessary or desirable.

Compositions and formulations for oral administration include powders orgranules, suspensions or solutions in water or non-aqueous media,capsules, sachets or tablets. Thickeners, flavoring agents, diluents,emulsifiers, dispersing aids or binders may be desirable.

Compositions and formulations for parenteral, intrathecal orintraventricular administration may include sterile aqueous solutionsthat may also contain buffers, diluents and other suitable additivessuch as, but not limited to, penetration enhancers, carrier compoundsand other pharmaceutically acceptable carriers or excipients.

Pharmaceutical compositions of the present invention include, but arenot limited to, solutions, emulsions, and liposome-containingformulations. These compositions may be generated from a variety ofcomponents that include, but are not limited to, preformed liquids,self-emulsifying solids and self-emulsifying semisolids.

The pharmaceutical formulations of the present invention, which mayconveniently be presented in unit dosage form, may be prepared accordingto conventional techniques well known in the pharmaceutical industry.Such techniques include the step of bringing into association the activeingredients with the pharmaceutical carrier(s) or excipient(s). Ingeneral the formulations are prepared by uniformly and intimatelybringing into association the active ingredients with liquid carriers orfinely divided solid carriers or both, and then, if necessary, shapingthe product.

The compositions of the present invention may be formulated into any ofmany possible dosage forms such as, but not limited to, tablets,capsules, liquid syrups, soft gels, suppositories, and enemas. Thecompositions of the present invention may also be formulated assuspensions in aqueous, non-aqueous or mixed media. Aqueous suspensionsmay further contain substances that increase the viscosity of thesuspension including, for example, sodium carboxymethylcellulose,sorbitol and/or dextran. The suspension may also contain stabilizers.

In one embodiment of the present invention the pharmaceuticalcompositions may be formulated and used as foams. Pharmaceutical foamsinclude formulations such as, but not limited to, emulsions,microemulsions, creams, jellies and liposomes. While basically similarin nature these formulations vary in the components and the consistencyof the final product.

Agents that enhance uptake of oligonucleotides at the cellular level mayalso be added to the pharmaceutical and other compositions of thepresent invention. For example, cationic lipids, such as lipofectin(U.S. Pat. No. 5,705,188), cationic glycerol derivatives, andpolycationic molecules, such as polylysine (WO 97/30731), also enhancethe cellular uptake of oligonucleotides.

The compositions of the present invention may additionally contain otheradjunct components conventionally found in pharmaceutical compositions.Thus, for example, the compositions may contain additional, compatible,pharmaceutically-active materials such as, for example, antipruritics,astringents, local anesthetics or anti-inflammatory agents, or maycontain additional materials useful in physically formulating variousdosage forms of the compositions of the present invention, such as dyes,flavoring agents, preservatives, antioxidants, opacifiers, thickeningagents and stabilizers. However, such materials, when added, should notunduly interfere with the biological activities of the components of thecompositions of the present invention. The formulations can besterilized and, if desired, mixed with auxiliary agents, e.g.,lubricants, preservatives, stabilizers, wetting agents, emulsifiers,salts for influencing osmotic pressure, buffers, colorings, flavoringsand/or aromatic substances and the like which do not deleteriouslyinteract with the nucleic acid(s) of the formulation.

Dosing is dependent on severity and responsiveness of the disease stateto be treated, with the course of treatment lasting from several days toseveral months, or until a cure is effected or a diminution of thedisease state is achieved. Optimal dosing schedules can be calculatedfrom measurements of drug accumulation in the body of the patient. Theadministering physician can easily determine optimum dosages, dosingmethodologies and repetition rates. Optimum dosages may vary dependingon the relative potency of individual oligonucleotides, and cangenerally be estimated based on EC₅₀s found to be effective in in vitroand in vivo animal models or based on the examples described herein. Ingeneral, dosage is from 0.01 μg to 100 g per kg of body weight, and maybe given once or more daily, weekly, monthly or yearly. The treatingphysician can estimate repetition rates for dosing based on measuredresidence times and concentrations of the drug in bodily fluids ortissues. Following successful treatment, it may be desirable to have thesubject undergo maintenance therapy to prevent the recurrence of thedisease state, wherein the compound is administered in maintenancedoses, ranging from 0.01 μg to 100 g per kg of body weight, once or moredaily, to once every 20 years.

EXPERIMENTAL Example I Human Subjects

Cells for reprogramming TCAP iPSC lines were recovered, with consent,from a skin biopsy from a patient with LGMD2G under a UMMS-IRB-approvedprotocol and assigned a de-identified ID number unlinked to thepatient's medical record. The consent process included conditions forsharing de-identified samples and information with other investigators.No PHI will be shared at any time per HIPAA guidelines.

Example II Cell Culture

LGMD2G primary dermal fibroblasts were isolated from a skin biopsy froma patient with LGMD2G as described³¹. Fibroblasts were reprogrammedusing the CytoTune 2.0 iPS Sendai Virus Reprogramming Kit(Thermo-Fisher) according to the manufacturer's directions. Clonal lineswere expanded for 6-10 passages before banking. Immunostaining wasperformed to confirm the absence of Sendai virus and expression of OCT4.Human iPSCs were cultured in iPS-Brew XF medium (Miltenyi Biotec) andpassaged every 3-5 days with Passaging Solution (Miltenyi Biotec)according to the manufacturer's directions.

Myoblasts were induced from iPSCs using a modification of the GeneaBiocells protocol¹¹. Following the generation of differentiated myotubesas described, cells were reseeded and cultured in human primary myoblastmedium³². CD56+ cells were purified by FACS using an anti-CD56-APCantibody (BD Biosciences) or MACS (Miltenyi Biotec) according to themanufacturer's directions. Myogenicity was confirmed by immunostainingmyoblast and myotube cultures using the mouse monoclonal antibodies MyoDclone 5.8 (Dako) and MF20 (DSHB) (data not shown).

A lymphoblastoid cell line from B lymphocytes (B-LCL) derived from apatient with HPS1 who was homozygous for the 16-bp microduplication waspurchased from Coriell (Catalog GM14606). A lymphoblastoid cell linefrom B lymphocytes (B-LCL) derived from a patient with Tay-sachs who washomozygous for the GATA microduplication in HEXA was purchased fromCoriell (Catalog GM11852) These cell lines was cultured following therecommended procedure using RPM1 1640 with 2 mM L-glutamine, 15% FBS and1% penicillin/streptomycin.

HEK293T cells were cultured following the recommended procedure usingDulbecco's modified Eagle's medium (DMEM), 10% FBS and 1%penicillin/streptomycin. All cultures were maintained in a humidifiedincubator with 5% CO₂ at 37° C.

Example III Purification of SpyCas9 and LbaCas12a

Protein purification for 3xNLS-SpCas9 and LbaCas12a-2xNLS followed acommon protocol. The generation and characterization of the 3xNLS-SpCas9(Addgene #114365) and LbaCas12a-2xNLS (Addgene #114366) constructs havebeen described (Wu et al. Nature Medicine (in press) & Liu et al.Nucleic Acids Research (PMID 30892626). The pET21a plasmid backbone(Novagen) is used to drive the expression of a hexa-His-tagged versionof each protein. The plasmid expressing 3xNLS-SpCas9 (orLbaCas12a-2xNLS) was transformed into Escherichia coli Rosetta(DE3)pLysS cells (EMD Millipore) for protein production. Cells weregrown at 37° C. to an OD600 of ˜0.2, then shifted to 18° C. and inducedat an OD600 of ˜0.4 for 16 h with isopropyl β-D-1-thiogalactopyranoside(IPTG, 1 mM final concentration).

Following induction, cells were pelleted by centrifugation and thenresuspended with Ni-NTA buffer (20 mM TRIS pH 7.5, 1 M NaCl, 20 mMimidazole, 1 mM TCEP) supplemented with HALT Protease InhibitorCocktail, EDTA-Free (100×) (ThermoFisher) and lysed with M-110sMicrofluidizer (Microfluidics) following the manufacturer'sinstructions. The protein was purified from the cell lysate using Ni-NTAresin, washed with five volumes of Ni-NTA buffer and then eluted withelution buffer (20 mM TRIS, 500 mM NaCl, 500 mM imidazole, 10% glycerol,pH 7.5). The 3xNLS-SpCas9 (or LbaCas12a protein) was dialysed overnightat 4° C. in 20 mM HEPES, 500 mM NaCl, 1 mM EDTA, 10% glycerol, pH 7.5.

Subsequently, the protein was step dialysed from 500 mM NaCl to 200 mMNaCl (final dialysis buffer: 20 mM HEPES, 200 mM NaCl, 1 mM EDTA, 10%glycerol, pH 7.5). Next, the protein was purified by cation exchangechromatography (5 ml HiTrap-S column, buffer A: 20 mM HEPES pH 7.5, 1 mMTCEP; buffer B: 20 mM HEPES pH 7.5, 1 M NaCl, 1 mM TCEP; flow rate 5ml/min, column volume (CV) 5 ml) followed by size-exclusionchromatography (SEC) on a Superdex-200 (16/60) column (isocraticsize-exclusion running buffer: 20 mM HEPES pH 7.5, 150 mM NaCl, 1 mMTCEP for 3xNLS-SpCas9 or 20 mM HEPES pH 7.5, 300 mM NaCl, 1 mM TCEP forLbCpf1-2xNLS).

The primary protein peak from the SEC was concentrated in an Ultra-15Centrifugal Filters Ultracel-30K (Amicon) to a concentration around 100μM based on absorbance at 280 nm. The purified protein quality wasassessed by SDS-PAGE/Coomassie staining to be >95% pure and proteinconcentration was quantified with a Pierce BCA Protein Assay Kit(ThermoFisher Scientific). Protein was stored at −80° C. until furtheruse.

Example IV In Vitro Transcription of Guide RNAs

The DNA cassette containing the U6 promoter and the sgRNA framework forSpyCas9 \vas cloned from pLKO1-puro vector³³ into pBluescript SK II+backbone (Liu et al., Nucleic Acids Research, submitted). Plasmidsexpressing each guide RNA from the U6 promoter were constructed byannealing oligonuleotides encoding guide RNA and cloning it into BfuAIcleavage sites in this vector. Templates for in vitro transcription ofSpyCas9 guides were amplified from the cognate plasmids using NEB Q5High-Fidelity DNA Polymerase for 30 cycles (98° C., 15 s; 65° C. 25 s;72° C. 20 s) using primer sets designed to include the T7 scaffold.

To generate CRISPR RNA (crRNA) for LbaCas12a, templates for in vitrotranscription were generated by PCR amplification of oligonucleotidesdesigned to include the T7 scaffold along with the guide RNA and a15-mer overlap sequence to allow annealing between the oligos. Theoligonucleotides encoded the full-length direct repeat crRNA sequence(Liu et al. Nucleic Acids Research, (PMID 30892626). Thirty cycles ofamplification were conducted using NEB Q5 High-Fidelity DNA polymerase(98° C., 15 s; 60° C. 25 s; 72° C. 20 s). The PCR products were purifiedusing Zymo DNA Clean & Concentrator Kit (Zymo Cat. #D4005).

In vitro transcription reactions were performed using the HiScribe T7High Yield RNA Synthesis Kit using 300 ng of PCR product as template(NEB Cat. #E2040S). After incubation for 16 h at 37° C., samples weretreated with DNase I for 40 min at 37° C. to remove any DNAcontamination. Each guide RNA was purified using the Zymo RNA Clean andConcentrator Kit. Final RNA concentration was measured using Nanodropand RNA was stored at −80° C. until further use.

Example V Electroporation of Cell Lines

3xNLS-SpyCas9 protein was precomplexed with sgRNAs either purchased fromSynthego or made in-house by T7 transcription and electroporated intocells using the Neon transfection system (Thermo Fisher).

Electroporation of IPSCs

After washing with PBS, iPSCs were dissociated into single cells with3:1 TrypLE:0.5 mM EDTA and neutralized with Ham's F10+20% FBS. To formRNP complexes, 20 pmol 3xNLS-SpyCas9 protein and 25 pmol gRNA werecombined in 10 μl Neon Buffer R and incubated for 10 min at roomtemperature. iPSCs (1×10⁵) were resuspended in 10 μl RNP-Buffer R mixand then nucleofected as follows: pulse voltage 1,500 V, pulse width 20ms, pulse number 1.

After transfection, the cells were plated onto Matrigel-coated 24-wellplates with iPS Brew XF supplemented with 10 μM Y27632 for expansion andgrown in a humidified incubator at 37° C., 5% CO₂, for 4 days beforeharvesting them for analysis. iPSC-derived myoblasts were electroporatedusing two pulses of 1,400 V and 20 ms width and plated onto a 24-welldish containing pre-warmed antibiotic-free human primary myoblast growthmedium and cultured for four to six days before analysis.

Electroporation of HPS1 Patient-Derived B-LCL Cells

Forty (40) pmol of 3xNLS-SpyCas9 protein was precomplexed with 50 pmolof sgRNA in buffer R for 10-20 min at room temperature in a final volumeof 12 μl. Three hundred thousand cells per reaction were resuspended in10 μl of RNP-buffer R mix and electroporated with 2 pulses at 1,700V for20 ms using the 10-μl tip. Cells were then plated in 24-well plates with500 μI of pre-equilibrated antibiotic-free culture medium and grown in ahumidified incubator at 37° C. and 5% CO₂ for 7 days before indelanalysis.

For the PARP-1 inhibition experiments, 300,000 HPS1 patient-derivedB-LCL cells were treated with 10 μM or 20 μM rucaparib camsylate(Sigma-Aldrich PZ0036) in standard growth medium for 24 h. Treated cellswere electroporated with SpyCas9 RNPs following previously describedprotocol. Following another 24 h incubation in rucaparib-containingmedium, cells were resuspended in PARP-1 inhibitor-free medium andharvested for analysis after 7 days.

Electroporation of HEK293T Cells

Twenty (20) pmol of 3xNLS-SpyCas9 protein and 25 pmol of in vitrotranscribed sgRNA were pre-complexed in Neon Buffer R for 10-20 min atroom temperature. One hundred thousand cells per reaction wereresuspended in 10 μl of RNP-buffer R mix and nucleofected with SpyCas9guide RNA complex using two pulses at 1,150 V for 20 ms using the 10-μltip. Cells were then plated in 24-well plates with 500 μl ofpre-equilibrated antibiotic-free culture medium and grown for 3 daysbefore analysis. F or Cas12a editing experiments at endogenousmicroduplications, 80 pmol of LbaCas12a protein was pre-complexed with100 pmol of in vitro transcribed crRNA and 100,000 cells per reactionwere nucleofected as described above.

Example VI Indel Analysis By TIDE

Genomic DNA was extracted from HEK293T cells using GenElute MammalianGenomic DNA Miniprep Kit (Sigma Aldrich) according to the manufacturer'sinstructions. The DNA region containing the 24-bp microduplication wasamplified using genomic DNA as template and primers using NEB Q5High-Fidelity DNA Polymerase (98° C., 15 s; 67° C. 25 s; 72° C. 20 s)×30cycles. See, Table 10.

Table 10: List of primers used to amplify genomic regions for TIDEanalysis

Primer name Primer Sequence Endo_24bp_F GAAGCGCTACCTGATTCCAATTCEndo_24bp_R TGGCAGTTAGGAAGGTTGTATCGSubsequently, the PCR product was purified using the DNA Clean &Concentrator-5 kit (Zymo research) and sequenced. Sanger sequencingtrace data were analysed using the TIDE webtool at tide.nki.nl/ to inferthe compositions of indels created at the sites of DSBs34.

Example VII Library Construction for Illumina Deep Sequencing

Library construction for deep sequencing was performed using a modifiedversion of a previously described protocol²⁶. In brief, iPSCs andmyoblasts were harvested following nuclease treatment and genomic DNAwas extracted using the GenElute Mammalian Genomic DNA Miniprep Kit(Sigma G1N350). Genomic loci spanning the target sites were PCRamplified with locus-specific primers carrying tails complementary tothe TruSeq adapters (Deepseq_TCAP_primer_fwd andDeepseq_TCAP_primer_rev). Fifty (50) nanograms of input genomic DNA wasPCR amplified with Q5 High-Fidelity DNA Polymerase (New EnglandBiolabs): (98° C., 15 s; 67° C., 25 s; 72° C., 20 s)×30 cycles. Next,0.1 μl of each PCR reaction was amplified with barcoded primers toreconstitute the TruSeq adaptors using Q5 High-Fidelity DNA Polymerase(New England Biolabs): (98° C., 15 s; 67° C., 25 s; 72° C., 20 s)×10cycles. Products were qualitatively analysed by gel electrophoresis.Equal amounts of the products were pooled and gel-purified usingQlAquick Gel Extraction Kit (Qiagen Cat. #28704). The purified librarywas deep sequenced using a paired-end 150-bp Illumina MiSeq run.

Example VIII Illumina Deep Sequencing Analysis

MiSeq data analysis was performed using Unix-based software tools.First, d FastQC (version 0.11.3;bioinformatics.babraham.ac.uk/projects/fastqc/) was used to determinethe quality of paired-end sequencing reads (R1 and R2 fastq files).Next, a paired-end read merger (PEAR; version 0.9.8)³⁵ was used to poolraw paired-end reads and generate single merged high-quality full-lengthreads.

Reads were then filtered according to quality via FASTQ³⁶ for a meanPHRED quality score above 30 and a minimum per base score above 24.After that, BWA (version 0.7.5) and SAMtools (version 0.1.19) were usedto align each group of filtered reads to a corresponding referencesequence.

To determine lesion type, frequency, size and distribution, all editedreads from each experimental replicate were combined and aligned, asdescribed above. Lesion types and frequencies were then catalogued in atext output format at each base using bam-readcount. For each treatmentgroup, the average background lesion frequencies (based on lesion type,position and frequency) of the triplicate negative control group weresubtracted to obtain the nuclease-dependent lesion frequencies.

Example IX Library Construction for UMI-Based Illumina Deep Sequencing

The construction of the UMI-based library used a linear amplificationstep to incorporate UMIs within the amplicons from the target locus¹⁵.HPS1 B-LCL cells and HEK293 Ts were harvested following nucleasetreatment for genomic DNA extraction using the GenElute MammalianGenomic DNA Miniprep Kit (Sigma G1N350).

Randomized unique molecular identifiers (UMIs) were incorporated withinthe 5′ locus-specific primers carrying tails complementary to TruSeqadaptors. In brief, 50 ng of input genomic DNA was linear amplified withNEB Q5 High-Fidelity DNA Polymerase (98° C., 15 s; 67° C., 25 s; 72° C.,20 s) for 10 cycles using the 5′ locus-specific primer with TruSeqadaptor conjugated with a UMI sequence.

Next a 5′ constant primer along with the 3′ locus-specific primer withTruSeq adaptor were added and further amplified for 30 cycles. Indexeswere then incorporated using barcoded primers to diluted PCR productsusing NEB Q5 High-Fidelity DNA Polymerase (98° C., 15 s; 67° C., 25 s;72° C., 20 s) for 10 cycles. Products were qualitatively analysed by gelelectrophoresis. Equal amounts of the products were pooled andgel-purified using QlAquick Gel Extraction Kit (Qiagen Cat. #28704) forDNA recovery. The purified library was deep sequenced using a paired-end150-bp Illumina MiSeq run.

Example X UMI-Based Deep Sequencing Analysis

The analysis of the UMI-tagged deep sequencing reads was adapted from aprevious protocol¹⁵. Initially, BWA (version 0.7.5) and SAMtools(version 0.1.19) were used to align each group of filtered merged-readpairs to a corresponding reference sequence, ignoring the uniquemolecular barcodes. Next, a custom Python and PySAM script was used toprocess mapped reads into counts of UMI-labelled reads for each target.The mapped reads were filtered by requiring a mapping value (MAPQ)larger than 30. Alignments were categorized into different categories ofindels using VarScan 237.

Next, UMI duplicates were identified to create a minimal set ofamplicons that can account for the full set of reads with unique UMIs.For each unique UMI, a minimum of five observations of the same sequencewas required to consider the sequence to have a low likelihood of beingan artefact (sequencing error in the UMI element). For sequences thatmet this threshold, all common sequences associated with the UMI wereconsolidated to one read for analysis of the distribution of sequencemodifications that were present at a locus. The resulting UMI numbertables, which describe the type of each sequence modification and itslength, were concatenated and loaded into GraphPad Prism 7 for datavisualization. Microsoft Excel version 16.21.1 was used for statisticalanalysis.

Example XI PacBio Library Preparation

Single molecule, real-time (SMRT) sequencing is modified from PacificBiosciences (PacBio). Nuclease-treated patient-derived iPSCs wereharvested for genomic DNA extraction with GenElute Mammalian Genomic DNAMiniprep Kit (Sigma G1N350). In brief, regions that flanked the TCAPtarget site were PCR amplified using locus-specific primers. See, Table10. The forward primer was designed to have the barcode sequencefollowed by the UMI and locus-specific primer sequence. The reverseprimer contains the barcode followed by the locus-specific primersequence. Input DNA (25-50 ng) was PCR amplified with Phusion HighFidelity DNA Polymerase (New England Biolabs): (98° C., 15 s; 65° C., 25s; 72° C., 18 s)×30 cycles. The products were qualitatively analysed bygel electrophoresis and subsequently gel purified with QIAquick GelExtraction Kit (Qiagen Cat. #28704). The purified products sequenced atthe UMASS Medical School Deep Sequencing Core for SMRTbell LibraryPreparation using a Pacific Biosciences Sequel Instrument.

Example XII PacBio Sequencing Data Analysis

For PacBio sequencing data analysis, Minimap2 (version 2.1438) was usedto align the raw Consensus_ROI (reads_of_insert.fastq) data to the 2-kbreference sequence. Alignment quality control and filtering wereperformed using custom Perl script to remove errors and filter outalignments with poor quality. For variation calling, a custom Pythonscript was used to extract deletions or insertions larger than 5 bp foreach read from the SAM files. Subsequently, deletions or insertions wereclassified into different groups on the basis of their length.IGV(version 2.4.16) was used for alignment visualization of the alignedreads using Quick consensus mode³⁹.

Example XIII Clonal Analysis of iPSCs

Following confirmation of MMEJ-mediated correction in the population ofLGMD2G iPSCs, clonal analysis was performed. Cells from the correctedpopulation were seeded into 96-1.0 well plates in the presence of Y27632at a frequency of 0.8 cells per well. iPSC clones were cultured forseveral weeks in iPS Brew XF (Miltenyi Biotec) before being collectedfor sequence analysis by deep-sequencing.

Example IVX Myoblast Differentiation and Detection of TelethoninExpression

iPSC-derived myoblasts were plated into 0.1% gelatin-coated 6-wellplates at a density of 100,000 cells per well in myoblast expansionmedium containing Ham's F-10 (Cellgro) supplemented with 20% fetalbovine serum (Hyclone, SH30071.03), 1.2 mM CaCl₂ (EMD OmniPur 3000) and1% chick embryo extract isolated from day 12 SPF Premium FertilizedWhite Leghorn Chicken Eggs (Charles River, North Franklin, Conn.). After4 days of expansion, the cells were incubated with myotubedifferentiation medium including DMEM/F12 (Thermo-Fisher) supplementedwith 1% N₂ (Thermo-Fisher, 17502-048) and 1%insulin-transferrin-selenium (Thermo-Fisher, 41400045).

After 10 days of differentiation, the cells were dissociated into singlecells using TrypLE. Subsequently the cells were fixed with 2% PFA for 15min and blocked with PBS including 2% BSA, 2% horse serum, 2% goat serumand 2% Triton X-100 for 20 min. The cells were then incubated withanti-telethonin antibody (Santa Cruz, sc-25327, 1:50) at 4° C. for 2days and IgG goat anti-mouse secondary antibody labelled with Alexa 488fluorophore (Invitrogen, A11017, 1:800) at room temperature for 1 h. Thecells were suspended in flow buffer (PBS including 0.2% FBS) and flowcytometry was performed using a BD FACSAria IIu (UMMS Flow CytometryCore Laboratory). Roughly 20,000 cells were included for analysis.FlowJo software (version 7.6) was used for data analysis.

Example XV Survey of Microduplications in ClinVar and in Human ReferencePopulations

Annotations of pathogenicity from ClinVar(ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/clinvar_20180225. vcf.gz)²⁰were combined with annotations of allele-frequencies from gnomADconsole.cloud.google.com/storage/browser/gnomad-public/release/2.0.2/vcf)²¹and from the 1000 Genome Projectftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/f using the annotatefunction in bcf tools⁴¹ (1.9), after decomposition of multi-allelicsites and normalization of variants with vt42 (v0.5772) against areference genome(broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assemblyl9.fasta).Most analyses were restricted to the intervals inftp.broadinstitute.org/pub/ExAC_release/releasel/resources/exome_calling_regions.v1.interval_list.

Insertions were extracted using vt (view -h -f “VTYPE==INDEL&&DLEN>0”);then duplications were identified, repeat units counted, internalshift-symmetries determined, and flanking genomic regions extractedusing a modified version of the vt function annotate_indels. Additionalprocessing (filtering, finding maximal allele frequencies amongdifferent populations, scanning for PAM sites and so on) was performedusing R (3.4.3), including the VariantAnnotation (1.24.5) package⁴³.

Exact tandem repeats in the reference genome were identified using theTandem Repeats Finder program (4.09)44 and checked for exact matcheselsewhere in the genome with bwa fastmap (0.7.17)45. Examples ofdifferent lengths were manually selected to use for the tests ofcollapse of endogenous microduplications.

Example XVI Code Availability Statement

Data analysis used a combination of publicly available software andcustom code, as detailed in the Methods. Custom python (CRESA-lpp.py)and R (indel_background_filtering.R) scripts used in the Illumina dataanalysis and the shell script (Tcap_pacbio_analysis.sh) used for theanalysis of the PacBio data are hosted on GitHub(github.com/locusliu/PCR_Amplicon_target_deep_seq). Scripts for thebioinformatic analysis of pathogenic microduplications are hosted atrambutan.umassmed.edu/duplications/.

REFERENCES

-   1. Moreira, E. S. et al. Limb-girdle muscular dystrophy type 2G is    caused by mutations in the gene encoding the sarcomeric protein    telethonin. Nat. Genet. 24, 163-166 (2000).-   2. El-Chemaly, S. & Young, L. R. Hermansky-Pudlak syndrome. Clin.    Chest Med. 37, 505-511 (2016).-   3. Sfeir, A. & Symington, L. S. Microhomology-mediated end joining:    a back-up survival mechanism or dedicated pathway? Trends Biochem.    Sci. 40, 701-714 (2015).-   4. Bae, S., Kweon, J., Kim, H. S. & Kim, J.-S. Microhomology-based    choice of Cas9 nuclease target sites. Nat. Methods 11, 705-706    (2014).-   5. Kim, S.-I. et al. Microhomology-assisted scarless genome editing    in human iPSCs. Nat. Commun. 9, 939 (2018).-   6. Hisano, Y. et al. Precise in-frame integration of exogenous DNA    mediated by

CRISPR/Cas9 system in zebrafish. Sci. Rep. 5, 8841 (2015).

-   7. Sakuma, T., Nakade, S., Sakane, Y., Suzuki, K. T. & Yamamoto, T.    MMEJ-assisted gene knock-in using TALENs and CRISPR-Cas9 with the    PITCh systems. Nat. Protocols 11, 118-133 (2016).-   8. Bertz, M., Wilmanns, M. & Rief, M. The titin-telethonin complex    is a directed, superstable molecular bond in the muscle Z-disk.    Proc. Natl Acad. Sci. USA 106, 13307-13310 (2009).-   9. Nigro, V. & Savarese, M. Genetic basis of limb-girdle muscular    dystrophies: the 2014 update. Acta Myol. 33, 1-12 (2014).-   10. Kosicki, M., Tomberg, K. & Bradley, A. Repair of double-strand    breaks induced by CRISPR-Cas9 leads to large deletions and complex    rearrangements. Nat. Biotechnol. 36, 765-771 (2018).    10.1038/nbt.4192-   11. Caron, L. et al. A human pluripotent stem cell model of    facioscapulohumeral muscular dystrophy-affected skeletal muscles.    Stem Cells Transl. Med. 5, 1145-1161 (2016).-   12. Oh, J. et al. Positional cloning of a gene for Hermansky-Pudlak    syndrome, a disorder of cytoplasmic organelles. Nat. Genet. 14,    300-306 (1996).-   13. Richmond, B. et al. Melanocytes derived from patients with    Hermansky-Pudlak syndrome types 1, 2, and 3 have distinct defects in    cargo trafficking. J. Invest. Dermatol. 124, 420-427 (2005).-   14. Brantly, M. et al. Pulmonary function and high-resolution CT    findings in patients with an inherited form of pulmonary fibrosis,    Hermansky-Pudlak syndrome, due to mutations in HPS-1. Chest 117,    129-136 (2000).-   15. Bolukbasi, M. F. et al. Orthogonal Cas9-Cas9 chimeras provide a    versatile platform for genome editing. Nat. Commun. 9, 4856 (2018).-   16. Sharma, S. et al. Homology and enzymatic requirements of    microhomology-dependent alternative end joining. Cell Death Dis. 6,    e1697 (2015).-   17. Wang, M. et al. PARP-1 and Ku compete for repair of DNA double    strand breaks by distinct NHEJ pathways. Nucleic Acids Res. 34,    6170-6182 (2006).-   18. Dutta, A. et al. Microhomology-mediated end joining is activated    in irradiated human cells due to phosphorylation-dependent formation    of the XRCC1 repair complex. Nucleic Acids Res. 45, 2585-2599    (2017).-   19. Zetsche, B. et al. Cpf1 is a single RNA-guided endonuclease of a    class 2 CRISPR-Cas system. Cell 163, 759-771 (2015).-   20. Landrum, M. J. et al. ClinVar: improving access to variant    interpretations and supporting evidence. Nucleic Acids Res. 46,    D1062-D1067 (2018).-   21. Lek, M. et al. Analysis of protein-coding genetic variation in    60,706 humans. Nature 536, 285-291 (2016).-   22. Komor, A. C., Badran, A. H. & Liu, D. R. CRISPR-based    technologies for the manipulation of eukaryotic genomes. Cell 168,    20-36 (2017).-   23. Kim, E. et al. In vivo genome editing with a small Cas9    orthologue derived from Campylobacter jejuni. Nat. Commun. 8, 14500    (2017).-   24. Edraki, A. et al. A compact, high-accuracy Cas9 with a    dinucleotide PAM for in vivo genome editing. Mol. Cell 73,    714-726.e4 (2019).-   25. Kleinstiver, B. P. et al. Engineered CRISPR-Cas9 nucleases with    altered PAM specificities. Nature 523, 481-485 (2015).-   26. Bolukbasi, M. F. et al. DNA-binding-domain fusions enhance the    targeting range and precision of Cas9. Nat. Methods 12,1150-1156    (2015).-   27. Hu, J. H. et al. Evolved Cas9 variants with broad PAM    compatibility and high DNA specificity. Nature 556,57-63 (2018).-   28. van Overbeek, M. et al. DNA repair profiling reveals nonrandom    outcomes at Cas9-mediated breaks. Mol. Cell 63,633-646 (2016).-   29. Suzuki, K. et al. In vivo genome editing via CRISPR/Cas9    mediated homology-independent targeted integration. Nature    540,144-149 (2016).-   30. Shen, M. W. et al. Predictable and precise template-free CRISPR    editing of pathogenic variants. Nature 563,646-651 (2018).-   31. Rittié, L. & Fisher, G. J. Isolation and culture of skin    fibroblasts. Methods Mol. Med. 117,83-98 (2005).-   32. Stadler, G. et al. Establishment of clonal myogenic cell lines    from severely affected dystrophic muscles—CDK4 maintains the    myogenic population. Skelet. Muscle 1,12 (2011).-   33. Kearns, N. A. et al. Cas9 effector-mediated regulation of    transcription and differentiation in human pluripotent stem cells.    Development 141,219-223 (2014).-   34. Brinkman, E. K., Chen, T., Amendola, M. & van Steensel, B. Easy    quantitative assessment of genome editing by sequence trace    decomposition. Nucleic Acids Res. 42, e168 (2014).-   35. Zhang, J., Kobert, K., Flouri, T. & Stamatakis, A. PEAR: a fast    and accurate Illumina

Paired-End reAd mergeR. Bioinformatics 30,614-620 (2014).

-   36. Blankenberg, D. et al. Manipulation of FASTQ data with Galaxy.    Bioinformatics 26, 1783-1785 (2010).-   37. Koboldt, D. C. et al. VarScan 2: somatic mutation and copy    number alteration discovery in cancer by exome sequencing. Genome    Res. 22,568-576 (2012).-   38. Li, H. Minimap2: pairwise alignment for nucleotide sequences.    Bioinformatics 34,3094-3100 (2018).-   39. Robinson, J. T. et al. Integrative genomics viewer. Nat.    Biotechnol. 29,24-26 (2011).-   40. 1000 Genomes Project Consortium A global reference for human    genetic variation. Nature 526,68-74 (2015).-   41. Li, H. et al. The Sequence Alignment/Map format and SAMtools.    Bioinformatics 25, 2078-2079 (2009).-   42. Tan, A., Abecasis, G. R. & Kang, H. M. Unified representation of    genetic variants. Bioinformatics 31,2202-2204 (2015).-   43. Obenchain, V. et al. VariantAnnotation: a Bioconductor package    for exploration and annotation of genetic variants. Bioinformatics    30,2076-2078 (2014).-   44. Benson, G. Tandem repeats finder: a program to analyze DNA    sequences. Nucleic Acids Res. 27,573-580 (1999).-   45. Li, H. Exploring single-sample SNP and INDEL calling with    whole-genome de novo assembly. Bioinformatics 28,1838-1844 (2012).

We claim:
 1. A programmable nuclease with sequence-specific DNA-bindingaffinity for a genomic locus, wherein said genomic locus comprises amicroduplication mutation.
 2. The programmable nuclease of claim 1,wherein said nuclease further comprises a protospacer adjacent motifbinding domain having said sequence-specific DNA-binding affinity forsaid genomic locus protospacer adjacent motif sequence.
 3. The nucleaseof claim 1, wherein said nuclease is selected from the group consistingof a Class II CRISPR single effector nuclease, a Cas9 nuclease, a Cas12nuclease, a zinc finger nuclease and a transcription activator-likeeffector nuclease.
 4. The nuclease of claim 1, wherein a duplicatesequence of said microduplication mutation has a length of between 1-40nucleotides.
 5. The nuclease of claim 1, wherein a duplicate sequence ofthe microduplication mutation has a length of greater than 40nucleotides.
 6. A microduplication of claim 1, where themicroduplication is in the form of a direct repeat.
 7. A method,comprising; a) providing; i) a subject comprising a genomic locus havinga microduplication mutation; and ii) a pharmaceutical formulationcomprising a programmable nuclease, said nuclease havingsequence-specific DNA-binding affinity for a region that contains saidmicroduplication mutation within said genomic locus; and b)administering said pharmaceutical formulation to said subject underconditions such that said microduplication mutation is replaced with awild type sequence of said genomic locus.
 8. The method of claim 6,wherein said wild type sequence replacement comprises a correctionthrough DNA repair.
 9. The method of claim 7, wherein said DNA repaircorrection is performed without assistance of an exogenously supplieddonor DNA.
 10. The method of claim 7, wherein said nuclease furthercomprises a protospacer adjacent motif binding domain having saidsequence-specific DNA-binding affinity for said genomic locusprotospacer adjacent motif sequence.
 11. The method of claim 7, whereinsaid genomic locus is selected from the group consisting of TCAP, HPS1,HEXA, DOK7 and RAX2.
 12. The method of claim 7, wherein said subjectfurther exhibits at least one symptom of a disease caused by said targetgene microduplication mutation.
 13. The method of claim 7, wherein saiddisease is selected from the group consisting of limb-girdle musculardystrophy 2G, Hermanksy-Pudlak syndrome, Tay-Sachs disease, familiallimb-girdle myasthenia and cone-rod dystrophy
 11. 14. The method ofclaim 7, wherein said administering further reduces said at least onesymptom of said disease.
 15. The method of claim 7, wherein saidnuclease is selected from the group consisting of a Class II CRISPRsingle effector nuclease, a Cas9 nuclease, a Cas12 nuclease, a zincfinger nuclease and a transcription activator-like effector nuclease.16. The method of claim 7, wherein said pharmaceutical formulationcomprises an adeno-associated virus encoding said programmable nuclease.